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ABSTRACT 

Diverse structural regularities identified in the genetic code, over the last four decades, 
have been unified with a model of code evolution that equated the time-order of amino 
acid (aa) addition to the code with the number of reaction steps (from the citrate cycle) 
required for its synthesis. To help establish the mechanism that coordinated growth of aa 
synthesis pathways with code formation, evidence was sought that tRNA adaptor 
molecules functioned as cofactors in aa synthesis, prior to appearance of the last 
common ancestor (LCA). 

Phylogenetic analysis of the conserved trace of ancestral tRNA revealed: (1) LCA tRNA 
species that descended from a common ancestor contained similar anticodons and were 
adaptors for aa synthesized from a common precursor. (2) LCA tRNA adaptors for sibling 
aa generally possessed the same core structure group and had elevated sequence 
identity. (3) Five domains of contiguous codons read by tRNA, sharing the same core 
group and bearing sibling aa, were spread along code rows. (4) These coding domains 
covered most of the code. They also overlapped extensively with code regions assigned 
to same-family aa and with regions read by phylogenetically-related LCA tRNA. (5) Type 
ID tRNA were adaptors for five aa, including three NH4"^ fixers, synthesized on 1-2 step 
paths. A tRNA-ID was found to be ancestral to the type lA tRNA asparagine-adaptor, 
which, in turn, displayed l<inship with adaptors for fourteen longer-path (4-14 steps) aa. 

Based on these observations, tRNA-dependent aa synthesis was credited with producing 
code domains by channeling nearest-neighbor codons, in code rows, to successive 
generations of synthetically related aa, during initial column-by-column code expansion. 
Putative cofactor/adaptor molecules normally appeared to diversify from a sibling aa 
tRNA; principally, an amide-amino-acid-adaptor. 



Introduction 

Since Crick's prescient conjecture that H-bonding adaptor molecules decoded template 
during protein synthesis, [1] tRNA has been found to participate in a range of processes 
beyond translation of a gene transcript. [2, 3] Synthesis of f-Met, Gln^, Asn^, Sec and 
Cys^, and some biomolecules unrelated to amino acids (aa), involve a tRNA cofactor. [4-9] 
tRNA also regulates gene transcription through an interaction with the T-box in a nascent 
mRNA molecule. [10] In addition, pre-tRNA catalyzes self-excision of an intron in the 
anticodon loop. [11] tRNA participation in aa synthesis, especially among various 
prokaryote species, supports the suggestion that an extensive network of RNA-dependent 
pathways once produced aa incorporated into proteins. Beyond this, growth of these early 
pathways conceivably helped shape the genetic code. [12] 

The proposition that growth of early aa synthesis pathways and code formation were 
linked appears far-sighted, when viewed from the perspective of the path-distance model 
of code evolution. [13-15] This model equates the time-order of aa addition to the code 
with the number of reactions (from the citrate cycle) required for synthesis and it has 
been shown to unify all significant features of code structure reported over the last four 
decades. They include: (1) Woese clusters involving NAN column triplets assigned to polar 
aa residues, and NUN triplets to hydrophobic residues. [16] (2) 5'-Base invariance among 
codons for synthetically related aa. [17] (3) Location of eight subdivided code boxes in the 
Standard Code. [18] (4) Allocation of intact code boxes to the six smallest aa incorporated 
into proteins. [17] (5) Presence of G/C enriched triplets in intact code boxes. [19] (6) 
Elevated coding capacity of codon mid-base. [20] (7) Dillon's codon-set subdivision rule. 
[21] (8) NAN column triplets encode four NH4'^ fixer aa. [13] (9) Codon mid-base, among 
short-path (up to 7 steps) aa, correlates with path length. [15] (10) Use of mainly intact 
code boxes to encode short-path aa. [15] (11) Assignment of one codon doublet to a 
long-path aa and another to a short-path aa, in most subdivided code boxes. [15] (12) 
The preferential conservation of short-path aa residues, among proteins that evolved 
before the last common ancestor (LCA). [14] (13) The synthesis of acidic aa residues on 
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short paths and basic residues on long paths. [13, 14] (14) Confinement of negatively 

charged aa residues to a single code box. [15] (15) Fall-off in C atom oxidation number 

with path length, among aa originating from citrate cycle components. [12, 15] (16) 

Source of Biro clusters involving NRN columns assigned to charged residues and NYN 

column triplets to hydrophobic residues. [15, 21] In addition, a frequency-shift model of 

code evolution, [22] proposing that early-comers to the code have elevated frequency in a 

residue sequence from an evolutionarily deep source, shows reasonable statistical 

agreement (p = 0.103) with the path-distance model. [15] Also in agreement with this 

model, [15] residue frequency shifts indicate GNN row triplets code for an earlier set of aa 

than ANN row triplets. [23] 

Evidence was sought in this endeavor that tRNA had a dual role as cofactors in aa 
synthesis and as adaptors in translation, before species diverged from the last common 
ancestor (LCA). The objective being to help clarify the molecular mechanism responsible 
for coordinating growth of aa synthesis pathways with code evolution. By focussing this 
investigation on analysis of the conserved trace of LCA tRNA in extant tRNA, post- 
divergence base substitutions were effectively 'filtered-out'. Since they contribute around 
two-thirds of the variability in extant adaptor sequences, [24] mutations from this source 
undoubtedly impeded earlier attempts to uncover the path of pre-LCA tRNA diversification 
directly from extant tRNA sequences. [25-27] An approach based on eliminating post- 
divergence variability prior to analysis of ancestral base sequences, however, presented 
some novel issues: (1) Information on pre-LCA tRNA evolution had to be extracted from 
the trace of LCA tRNA conserved in, at most, fifty sites within extant tRNA sequences, 
since almost twenty sites remained invariant throughout tRNA evolution. And, (2) analysis 
of LCA tRNA paralogs involved a set of molecules from a single taxonomic source, so no 
outgroups existed that could help locate the root of the phylogenetic tree obtained. 

Different types of tRNA present in the LCA are initially identified in this paper, following 
a detailed comparison of core structures and sequence identity levels. A family of 
synthetically related aa is then revealed to charge a particular kind of tRNA adaptor. 
Phylogenetic analysis of mutation distances and quaternary identity levels evident in the 



4 
trace of LCA tRNA conserved within extant tRNA sequences corroborated this finding 

and also indicated that adaptors for same-family aa generally descended from a common 
ancestor. The structure of the genetic code is then interpreted from the perspective of 
these findings. This paper concludes by presenting aspects of aa synthesis bearing on the 
mechanism of pre-LCA aa synthesis and by relating diversification of aminoacyl-tRNA 
synthetases (aaRS) to code evolution. 

The results presented suggest that the genetic code evolved at a time when aa, 
synthesized from a common precursor, were attached to tRNA cofactor/adaptor molecules 
descended from a common ancestor. tRNA-dependent aa synthesis, integrated in this 
manner, was attributed with channeling assignment of nearest-neighbor codons to same- 
family aa. These findings provided an explanation for occurrence of five row-oriented 
coding domains identified in the genetic code. 



Results and Discussion 

Types and subtypes of LCA tRNA 

Figure 1 shows base sequences containing the trace of LCA tRNA conserved in forty-eight 
extant tRNA species, obtained from the Bayreuth database. [28] These sequences strictly 
conserved, on average, thirty-three sites (anticodon triplet, ACC acceptor stem omitted); 
each conserved site contained an identical base in the consensus sequences 
representative of a tRNA species in the three primary domains of life. Analysis of residue 

FIGURE 1 
sequences in pre-divergence proteins [14] previously revealed that monomers conserved 
across species domains exhibit 93-100 per cent identity with corresponding monomers in 
the LCA sequence reconstructed with the aid of phylogenetic trees (see Methods, tRNA 
sequences). Around half (seventeen) of the conserved sites in the present survey were 
identical among all tRNA species, indicating they had remained invariant during tRNA 
evolution. Information related to tRNA diversification during code evolution resided in the 
remaining (sixteen) non-universal conserved sites (see Methods, Fig. 7). 



Thirty-nine of the LCA aa adaptors were type I tRNA. They separated into three 
subtypes, with distinct core structures (Fig. 1). Bases in the D stem, its surround, and 
variable loop of LCA tRNA species showed they had core structures recognizable from the 
L forms of tRNA [29] in phylogenetically deep archaea. The remaining nine adaptors were 
type II tRNA adaptors for Ser"* and Leu^. 

Generic LCA tRNA-I with a group A core (Fig. 2) contained a D stem with 4 b.p. 
between segments 8-UAGCUCAG and 21-AGAG'^uG, and a 5 nt. variable loop 44- 
AGGUC (bold letters indicate conserved bases). An archaeal tRNA-IA core [29] differs 
from this at five sites: (archaea ^ LCA): R9A, Y13C, R26G, N44A, and Y47U (tRNA-IA'^'^ 

3'GCA 

contained, in addition, a C25U substitution). These differences reflect improved base 
resolution in the present study, with a larger number of base sequences (1100 vs. 36) 
and more phylogenetically diverse sources (species domains, 3 vs. 1). 

Twenty tRNA-I species conserved evidence of a group A LCA ancestor (Fig. 1). 
Seventeen were charged by six Asp-derived aa [30-32]: Asn^, Thr^, Met'', Ile^, Arg^, 
Lys^°. The remaining three ancestral tRNA-IA were charged by Ala"*, a Pyr-derived aa 
[33]. Five tRNA-I conserved evidence of an ancestor with a modified group A core. 
Pyruvate-derived aa, Val^, acylated three tRNA-IA', and Ser"^ family members, Cys^ and 
Trp^^ each charged a tRNA-IA'. 

Pre-divergence adaptors for Phe" and Tyr^^ had a 4 b.p. D stem and 5 nt. variable 
loop, similar to archaeal tRNA-IB adaptors. Their variable loop also contained a 45U (Fig. 
1), as in an archaeal group-B core [29]. Unlike the novel G10:45G pair in tRNA-IA, a 
tRNA-IB has a G10:45U pair (Fig. 2). Purine-purine pair, G26:44A, is ubiquitous among 
tRNA-IA. In an LCA tRNA-IB, however, it is not strongly conserved (Fig. 1, 2). Variable- 
loop sites 44, 45 thus distinguish ancestral aromatic aa adaptors from tRNA-IA: 44-AG 
(tRNA-IA) -^ 44-NU (tRNA-IB). On the other hand, archaeal tRNA-IB have a U11:24A pair 
[29], while LCA tRNA-IB had a C11:24G pair, as in a group lA adaptor (Fig. 2). Group B 
is viewed as a subset of group A, in archaeal tRNA. [29] A closer kinship in the LCA 

FIGURE 2 
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conforms with tRNA-IB being an off-shoot in tRNA-IA diversification (see Fig. 5). It also 

implies the significance of 45U as an identity element in the recognition of tRNA-IB''^^ and 

tRNA-IB"^^''. 

Ancestral tRNA-ID had 4 b.p in their D-stem and 4 nt. variable loop, with site-47 
lacking; except for Pro"^ adaptors (Fig. 1). A supernumerary D-loop base occurred at 20A. 
Two departures from an archaeal group-ID core [29], R9G and KllU (K denotes G or U), 
reaffirm improved resolution in the present, larger survey. At site 24, pre-divergence 
tRNA-ID conserved an A, or C, analogous to an archaeal group ID core. D-Stem b.p. 
U11:24A(C), G12:23C, and U13:22U in tRNA-D differ from the corresponding tRNA-IA 
and -IB b.p., C11:24G, U12:23A, C13:22G. An unusual U13:22U pair occurs in the D- 
stem of all tRNA-ID. An isomorph, with a U13:21A pair, shifts U22 into a mini-loop (Fig. 
2). Triple bonds C25:G10:G45 and C25:G10:U45 in tRNA-A and -B, respectively, are 
replaced by a U25:G10:G45 bond in tRNA-ID. Group D tRNA-I also have a U13:A21:A46 
triple, instead of the C13:G22:G46 bond in tRNA-A and -B. As noted, a strongly 
conserved G26:44A bond occurs in tRNA-A, while tRNA-D generally form an A26:44A 
bond. 

Consistent with archaeal tRNA, [29] four Glu family aa (Glu^ Gln^ Pro\ His^^) and Asp^ 
account for all nine LCA group D tRNA-I (Fig. 1). Early tRNA'^'^ also displayed a modified 
group D core (Fig. 2). D-stem sites 10-15 were GUNUAG in tRNA'^'^ and GUGUAG in generic 
LCA tRNA-ID (Fig. 1, 2). The ancestral Gly adaptor also had a 4 nt. variable loop, 44-UGA-C, 
with a gap at site-47, and, apart from an A44U substitution, this matches an 
ancestral tRNA-ID variable loop, 44-AGA-C. In archaea, tRNA-l'^'^ and tRNA^""' are 

CCP'CCG 3'CAG 

designated as type IC tRNA, [29] characterized by a 10-GUCUA D-stem segment and 5 
nt. variable loop, 44-NRANC. 
Pre-divergence tRNA-I^^' variable loop, 44-'°'gGGUC, closely matches the generic 

3'CAG 

LCA group-A tRNA-I variable loop, 44-AGGUC (Fig. 1). This distinguishes it from an 
archaeal group-C adaptor, with a 44-NRANC variable loop. The D stem of this Val 
adaptor was insufficiently resolved to classify. Archaeal Val isoacceptor, tRNA-I^^' 

3'CAU 

has a modified group A core. [29] All three pre-divergence Val adaptors, tRNA-I^^' , 

3'CAY,3'CAG 



in this survey, had a 44-AGGUC variable loop and a C25:G10:G45 triple bond, 
consistent with being type lA related tRNA. In this connection, all LCA isoacceptors 
examined displayed core structure homology (Fig. 1, 2). This favored shared ancestry 
and common identity elements in isoacceptors recognized by a single aaRS. Clearly, tRNA 
gene recruitment occurred far less frequently during biological evolution than 
experimental observations would imply [29, 34], and this parallels the constrained 
malleability of the genetic code. [35] 

A modified group-A core occurred in LCA adaptors for Cys^ and Trp^"^. They had an A- 
like D-stem, IO-GCNNAG/21-A'^gNGCA, 5 nt. variable loop (44-NN^uU^u) and a 
C25:G10:N45 triplet. Both are Ser-derived aa. [30,32] This implies a group A tRNA-I Ser 
adaptor was the source of type II tRNA adaptors. 

Channeled diversification of tRNA species in thie pre-LCA era 

Neighbor-joining calculations [39] with Jukes-Cantor distances [40] at non-universal sites 
shared between the conserved trace of LCA tRNA produced the dendogram in Fig. 3 (see 
Methods, Phylogenetic analysis). The tree reconstructs tRNA diversification in the pre- 
LCA era from vestiges of ancestral tRNA conserved within populations of extant tRNA 
base sequences. 

FIGURE 3 
Construction of this tree follows the procedure illustrated in uncovering a 23-residue Fd 
antecedent (Pro-Fd-5) from the vestige of a tandem duplication in the 55-residue 
sequence of LCA ferredoxin (Fd). [14, 41] Statistical evaluation of the tree topology 
obtained on analysis of mutation distances between homologues of Fd from anaerobe 
Clostridium pasteurianum relied on demonstrating a fall-off in internal identity with 
increasing distance from the tree root. Rather than assess the tree from a set of nodal 
bootstrap probabilities, a single probability thus indicated whether it was globally 
ordered. In the event that early tRNA diversification and growth of aa synthesis pathways 
were linked, as suggested by results in the previous section, the path-distance model 
would favor early (short-path) aa having early tRNA adaptors. tRNA ancestral to extant 
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adaptors with short-path aa accordingly should cluster near the tree root. In principle, 

this provides a means to locate the root of the LCA tRNA tree (Fig. 3); since these tRNA 

are from a single taxonomic source, use of outgroups was not an option. Furthermore, aa 

code age could be anticipated to increase with tRNA distance from the tree root in a valid 

topology. 

Figure 4 shows that aa code age increased linearly, at 1.57 stages per unit distance, in 
the tree depicting pre-LCA tRNA diversification (Fig. 4). Tree topology supports a positive 
increase in aa code age with tRNA distance at probability, p(|3 > 0) = 0.948 (t(df = 30) = 
1.671), where (3 is the regression equation slope. [36] This increase occurred over 

FIGURE 4 
stages 2-7 of code evolution, corresponding to initiation and expansion phases of code 
evolution in the path-distance model. [13-15] The correlation does not extend to post- 
expansion phase, when Arg^, Lys^°, Phe^^ Tyr^^ His^^, Trp^"^ were added to the code. This 
is anticipated, as they apparently captured short-path aa adaptors, following subdivision 
of error-prone code boxes (sets of four) into doublets. [13,15,18] Serine"* adaptors were 
also non-conforming, and these type II tRNA evidently replaced the type I tRNA originally 
charged by this early expansion phase aa (see Types and subtypes of LCA tRNA, Identity 
and source of LCA tRNA). 

LCA tRNA tree topology (Fig. 3) incorporated a second global feature: aa synthesized 
from different precursors in biosynthesis differ in tree stacking rank. Differences in aa 
family stacking rank arose by chance with a probability, p = 2.05x10"^ (x^(df = 4) - 
11.61), in a Kruskal-Wallls analysis. [42] Inspection of the LCA tRNA tree reveals that 
adaptors for same-family aa cluster within distinct 'monophyletic' tree regions, excluding 
both Shi family adaptors - tRNA-IB^^^ , tRNA-IB"^^'' (Fig. 3). At the tree base, region-1 

3'AAG 3'AUG 

contains eight tRNA-ID species and tRNA-Il'-^" ; as this type II tRNA is in a type I tRNA 

3'GAU 

region, gene recruitment [29, 34] can be seemingly excluded. Seven are Glu family 
adaptors; tRNA-ID'^^'' being the only non-Glu family type-ID adaptor. Anticodons of five 
region-1 tRNA contain a 35U and six had a 36G - respectively, anticodon mid- and 3'- 
base. Three aa (Asp^, Glu^, Gln^) with region-1 tRNA adaptors were NH4"^ fixers. They 
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form on 1-2 step paths and are first-generation aa in the path-distance model of code 

evolution. [13, 15] Amino acid path-distances thus place region-1 at the root of the tRNA 
tree. Branch lengths in region-1 and -2 are scaled to one-tenth their original length, 
suggesting tRNA initially changed more rapidly than the repertoire of aa. 

Pyruvate-derived aa charged six (6/8) tRNA in region-2. All were type lA tRNA. 
Isoleucine^ also charged a region-2 type lA tRNA, raising the possibility of gene 
recruitment. tRNA-ID''^'^ was the only non-type lA adaptor in region-2. Seven region-2 

3'CCG 

tRNA had a 36C. Region-3 contained eighteen tRNA. Fourteen were adaptors for Asp- 
derived aa and they charged type lA tRNA. Twelve had a 36U. Among thirteen tRNA in 
region-4, Ser family aa acylated eight. Seven type II tRNA branched in region-4. Five 
region-4 tRNA contained a 36A and six a 35C. Adaptors for Shi derived aa, Phe", Tyr^S 
branched in region-3 (Fig. 3). Consistent with this, identity levels indicate their type IB 
adaptors descended from a tRNA-IA (see Table 1); they form a fifth cluster-region in 
archaeal {Haloferax volcanii) tRNA (unpublished result). 

The pattern of pre-LCA tRNA diversification apparent in Fig. 3 occurs, in muted form, in 
the tree constructed from sequences (possessing post-divergence variability) of thirty-six 
archaeal tRNA species by the Fitch-Margoliash method (range of nodal bootstrap 
probabilities, p = 0.36-1.0). [29] As in Fig. 3, region-1 cluster had adaptors for Glu^, 
Gln^, Pro"*, His^-^, Asp^, and region-2 had Ala"^, Val^ adaptors. Region-3 contained Thr^, 
Met^, He'', Lys^° adaptors, while Ph^^ and Tyr^^ adaptors separated into a distinct cluster 
(region-5). Adaptors for Ser"*, Cys^, Trp^"^ and Leu^ (region-4) were omitted from this 
analysis. Adaptors for thirteen aa (13/16) thus cluster in archaeal tree regions 1-3 and 5, 
corresponding to an 81 per cent cluster-rate. This compares favorably with the 85 per 
cent (17/20) rate evident in the LCA tree (Fig. 3). Consistent with this agreement, 
archaeal tRNA were found to have evolved only slowly. [27] tRNA adaptors for 
synthetically related aa cluster more in evolutionarily deep sources, in accord with tRNA- 
dependent aa synthesis in the pre-LCA era (unpublished result). 

Same-family aa preferentially charged phylogenetically-related tRNA, in the pre-LCA 
era, according to present results. Combining probabilities for tRNA aa identity 
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distribution, within each region, revealed significant clustering in the tRNA tree; p - 

1.36x10"^, x^ (df = 10) = 56.90 (see Appendix A: Distribution of amino acid identity, core 
group and anticodon bases in LCA tRNA tree). tRNA core structure also showed 
significant clustering within tree regions, on a global scale; p = 1.21x10'^, x^ (df = 10) = 
46.42. 3'-Anticodon base identity clusters were equally significant; p = 1.04x10'^, x^ (df 
= 10) = 46.77. In contrast, mid-base identity clusters were comparatively weak, p = 
4.47x10"^ x^(df = 10) = 18.67. 

Acquisition of phylogenetically related tRNA by synthetically related aa links pre-LCA 
tRNA diversification with growth of aa synthesis pathways during code evolution. This 
provides compelling evidence that tRNA were cofactors in pre-LCA aa synthesis 
pathways. Adaptors for related aa also displayed elevated core structure homology and 
3'-anticodon base identity. Elevated 3'-anticodon base identity in same-family tRNA 
adaptors (Fig. 3) is congruent with 5'-base invariance among codons for same-family aa. 
[16] Code expansion in a column-by-column pattern, through mid-base substitutions, 
conforms with both weak anticodon mid-base clusters and 5'-base invariance among 
same-family aa codons. [13,15] Present results point to the code containing domains of 
contiguous codons read by tRNA adaptors, displaying core structure homology, for same- 
family aa. 



Identity and source of LCA tRNA species 

The source of pre-LCA tRNA species was sought in this section from quaternary identity 
levels in the conserved trace of ancestral tRNA. In addition to elevated trace identity, 
time-order based on aa path-distance implies that the aa attached to the source of a pre- 
LCA tRNA has a comparatively short-path. [13, 15] Metabolic source of the acylating aa, 
adaptor core structure (Fig. 1, 2) and location within the LCA tRNA tree (Fig. 3) were also 
noted in forming adaptor/ancestor pairs. tRNA with a 34U were chosen for this study, 
when available, as the path-distance model suggests U occupied the 5'-anticodon site in 
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the initial stages of code evolution. [13, 15, 18] The source of nineteen (19/23) pre- 
LCA tRNA species has been reported. 

TABLE 1 

Traces of LCA tRNA shared from to 20 non-universal sites, conserved across species 
domains. Their intrinsic identity varied from (Leu^, 3'GAU -> Arg^, 3'GCU, anticodon 
triplets) to 100 (Asn^, 3'UUG -> Lys, UUU) per cent. While quaternary identity levels 
ranged from to 16 quarts (Table 1). Thirteen (13/19) adaptor/ancestor pairs showed 
same-family aa identity, two being charged by the same aa (isoacceptor pairs). Identity 
values for these pairs appear within triangles in Table 1. Sixty-eight per cent (13/19) of 
adaptor/ancestor pairs identified thus involved same-family aa, despite same-family aa 
pairs representing only 15 per cent (37/253) of the number of pairwise combinations. 
Their frequency is statistically significant in a contingency table, with y} corrected for 
continuity; x^(df = 1) = 9.21, and p = 2.4x10'-^. tRNA in twelve same-family aa pairs also 
clustered together in the LCA tRNA tree (Fig. 3). Present results thus corroborate the 
earlier finding that tRNA charged by same-family aa generally descended from a common 
ancestor (See Channeled diversification of tRNA species in the pre-LCA era). 

Serine"^ -> Cys^ (trace identity, 8.5 quarts) was the only non-conforming pair, as 
tRNA-II^^'' and tRNA-IA''^^^ branched in region-4 and -3, respectively. Ten 

3'AGU 3'ACG 

adaptor/ancestor pairs also displayed core structure homology. Three heterologous pairs 
involved adaptors for Ser-derived aa, tRNA-IA'^^' , tRNA-IA''^''^ and tRNA-ID'^'^ , 

3'ACG 3'ACC 3'CCU 

indicative of the putative type I -^ II tRNA transition by the Ser"* adaptor. Elevated 
tRNA-II^^' identity with tRNA-IA'^^" (9.2 quarts. Table 1) suggests Ser"* once charged 

3'AGU 3'UUG 

charged an Asn-like, type I tRNA cognate with NCN column codons, UCN. The transition 
to a type II tRNA opened the way for relocation of anticodon-loop identity elements, to 
an enlarged variable loop, and allow assignment of multiple codon sets (NCN and NGN 
column triplets) to Ser"* during code expansion from the NH4"^ Fixers Code. [13, 15] 
Adaptor identity in the Tyr^^ -> Trp^'* pair may be noted to slightly exceed that in the Ser"^ 
^ Trp^"^ (aa precursor/product) pair: 8 vs. 7 quarts (Table 1). However, tRNA^^'' 

3'AGU 

branched in tree region-4, with tRNA-IA'"^'" , while tRNA-IB"^^'' did not (Fig. 3), favoring 
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3'ACC 3'AUG 

the Trp precursor adaptor as source. In addition, the Trp-adaptor showed core structure 
homology with the adaptor for sibling aa, Cys^, and putative ancestral Ser-adaptor. 
tRNA-Il'-^" shows kinship with tRNA-II^^'' (trace identity, 5.7 quarts; Table 1). 

3'AAU 3'AGU 

Nearest-neighbor codons in the UNN row (UCN, UUR) also encode Ser"^ and Leu^. Most 
Leu isoacceptors, moreover, branch in tree region-4 with Ser adaptors (Fig. 3). These 
findings point to tRNA-Il'-^" descent from a variant tRNA-II^^'' (Table 1). Because the 

3'AAU 3'AGU 

synthesis of Ser"^ and Leu^ originate at SPG and Pyr, [30, 31] respectively, this path of 
descent appears to uncouple aa pathway growth and tRNA diversification, in a departure 
from to earlier findings (See Channeled diversification of tRNA species in ttie pre-LCA 
era). Alternatively, an early aa family, containing both Ser"* and Leu^, could have 
fragmented following pathway rearrangement. This development apparently 
accompanied the protein takeover of aa synthesis pathways, some time after appearance 
of the type II tRNA Leu-adaptor. With evidence an tRNA'^^"-like adaptor was ancestor of 
the Ser-adaptor (Table 1), the Asp family once plausibly included Ser family aa. Leucine^ 
sibling, Ala"^, likewise charged a tRNA'^^"-like adaptor (trace identity, 11 quarts with pre- 
Lys^° adaptor. Table 1). Inclusion of both Ser and Pyr aa families in a pre-LCA Asp family 
could account for the kinship observed between Ser- and Leu-adaptors within the 
framework of coordinated tRNA diversification and growth of amino acid synthesis 
pathways (see Channeled diversification of tRNA species in the pre-LCA era). 
Consistent with an enlarged pre-LCA Asp family, adaptors for fourteen aa were 

traced to a primordial Asn adaptor, tRNA-IA'^^" . They cluster in the larger of two deeply 

uuu 

branching domains in the LCA tRNA tree in Fig. 5. Nodes in this tree join an LCA tRNA 

FIGURE 5 
with its putative ancestor (Fig. 5a), based on quaternary identity levels (Table 1). Branch 
lengths (ordinal value) correspond to the attached aa code age, estimated from path- 
distance. Progression along the /-axis thus depicts events at increasingly later stages of 
code evolution. [13-15] The smaller, earlier domain contained five tRNA-ID charged by 
Glu family aa and Asp^. Aspartate^ and Glu^ are precursors, in biosynthesis pathways, to 
half (nine) aa. [13,15,30-33] Aspartate- and Glu-adaptors were not, however, the 
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apparent source of product aa adaptors (Table 1). This finding is counter to the 

'precursor-product' hypothesis, wherein an incoming aa captured a precursor-adaptor on 

entering the code, [13,15,39]. Present findings instead implicate precursor misacylation 

of a variant sibling adaptor, generally a tRNA'^^"-like, or tRNA'^'"-like, adaptor, as the 

mechanism new aa entered the code (Fig. 5a). 

At the tree root is a tRNA-ID. It was uncovered on backtracking from the NH4"^ Fixers 

Code [13, 15]. The primordial tRNA adaptor appears to have been a pre-informational 

'universal' tRNA adaptor charged by an Asp^, Glu^, Asn^, or Gln^ (Fig. 5a). It is credited 

with assembling random sequence amide-bearing, polyanionic polypeptide chains, on a 

poiy(A) template. [13-15] Replacement of the first tRNA adaptor implies competition for 

AAA codons with tRNA-IA'^^" took place. Evolution of a type lA tRNA, specific for Asn^, 

uuu 

would distinguish it from the primordial adaptor, tRNA-ID^'^^P'^'"''^"''^'"\ This led to 

uuu 

polymerization of ordered ordered sequence polypeptide chains directed by the NH4"^ 
Fixers Code, in the path-distance model of code evolution. 

The large Asn-adaptor domain contains three sub-domains, rooted at Ser, Met, and Ala 
adaptors (Fig. 5a). Seven tRNA species branch in the Ser sub-domain, including three 
tRNA-I adaptors for Ser-derived aa, Cys^, Gly^, and Trp^"^. There are five tRNA species in 
the Met-adaptor sub-domain, and two in the Ala-adaptor sub-domain. tRNA-IB'''^^ 
branched in region-3 of the LCA tree together with adaptors for Asp-derived aa (Fig. 3). 
This is in accord with descent from tRNA-IA^^' (Fig 4, Table 1). Both tRNA-IA^^' and 
tRNA-IB^^"" notably read NUN column codons. The adaptor for Phe^^ sibling, Tyr", 
likewise, branched in region-3. Adaptors for siblings, Ala"* and Val^, form the Ala sub- 
domain. The adaptor cluster pattern in the large tRNA tree domain (Fig. 5a) supports the 
proposed fragmentation of an enlarged pre-LCA Asp family into Asp, Pyr, Ser, and Shi 
families. [15] 

Radiation of tRNA'^^", leading to formation of the large tree domain, could reflect use of 
distinct tRNA-I subtypes for precursor and product aa in the Asp family (Fig. 5b). 
Aspartate^ acylated a type ID tRNA'^^'' and misacylated a type lA tRNA, showing kinship 
with an Asn-adaptor, during product aa synthesis. This contrasts with the small domain. 
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where Glu^ acylated and misacylated a type ID tRNA. Use of different tRNA-I subtypes 

for precursor and product aa would aid molecular recognition, during tRNA'^^" radiation. 

In this connection, pre-LCA tRNA-dependent aa synthesis possibly reduced the number of 

(ribozymal) aminoacyl-tRNA-synthetases (r-aaRS) to two: those for the primordial aa 

precursors, Asp^ and Glu^. This would shift recognition away from the product aa itself to 

its tRNA adaptor/cofactor, whose base sequence evidently encoded aa identity and 

synthesis pathway in the RNA World. 

Coding domains in tlie Standard Code 

Figure 6 epicts the Standard Code as a patchwork of distinct domains. Each domain is 
specified as a region of contiguous codons read by tRNA, possessing core structure 
homology, that are charged by aa synthesized from a common precursor. They have a 
row-wise orientation within the code, as anticipated from evidence for columnwise 
growth of the code (see Channeled diversification of tRNA species in the pre-LCA era). 

FIGURE 6 

Codons for Asp family aa (Asp^, Asn^, Thr^, Ile^, Met^, Arg^, Lys^°) form the largest 
domain. It contains twenty triplets spread principally along the ANN row. Only fifteen are 
distinguishable during translation, however, as a result of 3'-pyrimidine (Y) degeneracy. 
[43] Aspartate^ codons, GAY (Fig. 6, are not in the Asp family domain, because Asp^ 
charges a type ID tRNA (Fig. 1), making it heterologous with the type lA adaptors for 
Asp-derived aa. CNN row codons, CG'^g/ assigned to Arg^, are nearest-neighbor to ANN 
row Arg codons, AG'^'g, so its CNN row codons (CG'^'g , CCY) are a contiguous extension of 
the Asp family domain. Aspartate family aa form on pathways that originate at 
oxaloacetate (OAA) in the citrate cycle, apart from Arg^. [30, 31] Inclusion of Arg^ in this 
family derives from a reaction between Asp^ and Cit^, at step 8 of synthesis, making Asp^ 
the last direct precursor. [43] 

Glutamate family aa (Glu^, Gln^, Pro"*, His^'^) form on pathways originating at a- 
ketoglutarate (a-KG) in the citrate cycle, discounting His^'^ whose synthesis commences 
at ribose-5-phosphate in the pentose cycle. Glutamine^ donates an amide group at step 9 



15 
of His^'^ synthesis (measured from the citrate cycle) [30], recruiting this aa for the Glu^ 

family. Five aa encoded by CNN row codons include three Glu family members (Fig. 6). 

All four Glu family aa have tRNA-ID adaptors, cognate with eight distinguishable, 

contiguous codons (GA*g, CAY, CA^g, CCY, CC'^g)- 

Pyruvate-derived siblings, Ala"* and Val^, charged type lA related LCA tRNA cognate 
with GNN row codons, GCN, GUN (Fig. 6). A third Pyr-derived aa, Leu^, acquired NUN 
column codons (UUR, CUN). Furthermore, Leu^ charged type II tRNA. This placed Leu-^ 
codons outside the Pyr family code domain. 

Serine family aa (Ser"*, Cys^, Gly^, Trp^"*) derive from SPG in the central trunk, with 
Trp^"^ the sole exception. A reaction between Ser'* and indole, in the final step of 
synthesis, recruited Trp^"^ into the Ser family. [30] Cysteine^ and Trp^"^ charged type lA 
related LCA tRNA, cognate with UNN row codons: UGY and UGG, respectively (Fig. 1). A 
small Ser domain resulted in the code UNN row. 

Aromatic aa, Phe^^ Tyr^^ form on the Shi pathway, originating with a reaction between 
phosphoenolpyruvate (central trunk) and erythrose-4-phosphate (pentose cycle). [30] 
They acylated type IB LCA tRNA cognate with codons UUY and UAY (Fig. 1, 2). A second 
small domain thus resulted in the code UUN row. 

Thirty-two (32/45) distinguishable codons (STOP codons excluded) occur within five 
domains of the Standard Code; equivalently, forty-three (43/61) triplets allocated to aa 
occur within code domains. They encode fifteen siblings and one precursor (Glu^) aa. 

Domain codons are contiguous. Parameters determining the probability for chance 
occurrence of nearest-neighbor codons between each aa and other members within the 
same domain, synthesis family, or tRNA tree region appear in Table 2. Codon contiguity 

TABLE 2 
was strongest within aa families. The combined probability [37] for codon contiguity 
among same-family aa equaled 5.09x10"^, based on x^(df = 40) of 106.97. Codons read 
by tRNA, with designated aa identity, branching within the same tree region (Fig. 3), or 
within the same code domain (Fig. 6) had a combined contiguity probability of 1.02x10"^ 
(^^(df = 40) = 104.9), and 2.16x10"^ (x^(df = 32) = 82.9), respectively. Closely 
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contiguous codons thus encode aa of the same family, tRNA tree region, and code 

domain. Codons spanned by tree regions were two-fold less contiguous than codons of 

families, and domains were 42-fold less contiguous. Precursor heterogeneity (SPG, Pyr) 

among aa (Ser"^, Leu^) charging type II tRNA, and adaptor/ancestor heterogeneity within 

the Ser family (tRNA-II^^'' ^ tRNA-IA'^^"; tRNA-II^^' -> tRNA-IA''^''^; tRNA-II^^' ^ tRNA- 

ID''^'^) lowered codon contiguity within code domains. Adaptor heterogeneity was 

attributed to replacement of an ancestral type I tRNA by a type II tRNA, facilitating 

allocation of multiple codon sets to a single aa, during rapid expansion from the NH4"^ 

Fixers Code (see Identity and source of LCA tRNA species). 

Amino acid families, code domains, and tRNA tree regions overlap extensively in the 
Standard Code (Fig. 6). Amino acid precursor correlated with aa identity of adaptors in 
the same tRNA tree region (Fig. 3,4) to produce a Kendall rank coefficient [42], x, of 
0.92, p = 7.0x10"^ (Table 2). Same-family and same-domain aa corresponded with a 
T-coefficient of 0.68, p = 1.5x10"^. Amino acid identity of same-tree region tRNA and 
same-domain aa had a i-coefficient of 0.72, p = 8.3x10'^. 

Code structure has provided compelling evidence that tRNA diversification from a 
primordial adaptor, growth of aa synthesis pathways outward from central metabolism, 
and codon assignments during code evolution were coordinated. Amino acid synthesized 
from a common precursor generally charged tRNA descended from a common ancestor 
and cognate with codons within same code row. A dual tRNA role, as cofactors in aa 
synthesis and adaptors in translation, is credited with coordinating aa pathway growth 
and code evolution with pre-LCA tRNA diversification. Columnwise expansion of the code 
from NAN -^ NCN -^ NGN -^ NUN [13, 15] is attributed with producing the row-wise 
orientation of code domains (Fig. 6). 5'-Base invariance among codons of same-family 
aa, as noted, also conforms with this growth pattern. [17] Growth of the code column- 
by-column reduced the risk of a potentially lethal [44] mutation to an unassigned triplet, 
within a template sequence; lowering it initially to the chance of a codon mid-base 
substitution. [13, 15] Code structure also implied code expansion was subject to a 
hydrophobic attractor [15] and error-minimization, targeting codon mid-base misreading. 
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[18] Overprinting in code rows with short-path sibling aa (Lys /Asn , His /Gin ; 
Trp^VCys^), by post-expansion additions to the code (Arg^, Lys^°, Phe^^ Tyr", His^^, 
Trp^"^), indicated the dual cofactor/adaptor role of pre-LCA tRNA channeled codon 
assignments throughout code evolution. [13, 15] 



Methods 

tRNA sequences 

Sequences of forty-eight tRNA species were obtained in a search of the Bayreuth 
database (www.uni-bayreuth.de/departments/biochemie/tRNA/). [28] 1100 base 
sequences, around 73 bases each, from sources in Archaea, Bacteria, and Eukarya were 
examined in this survey (see Appendix B: tRNA sources and access numbers (Bayreuth 
database). A consensus sequence was determined for tRNA, with designated aa identity 
and codon specificity, for sources from each species domain. The trace of an ancestral 
tRNA, predating species divergence from the LCA, was reconstructed from sites with an 
identical base in the consensus sequence representative of a tRNA species in each of the 
three domains of life. Residues conserved across species domains in consensus 
sequences for low potential ferredoxin and proteolipid helix-1 were previously reported to 
match corresponding residues in the LCA monomer sequence, inferred from phylogenetic 
trees, with a 93-100 per cent accuracy. [14] The computational efficiency achieved by 
using consensus sequences, to filter-out post-divergence variability, incurred a cost, as 
the number of sites resolved in the LCA molecule was a half to three-quarters of that 
obtained by phylogenetic analysis. More complete reconstruction of LCA sequences with 
phylogenetic methods raises the possibility that a future investigation may extend the 
results presented in this paper. 

Sites conserved by virtually all tRNA species in this survey were credited to a 
primordial tRNA adaptor that predated tRNA diversification, deep within the pre-LCA era. 
The remaining, non-universal sites, strictly conserved across species domains in a given 

FIGURE 7 
tRNA species, provided information relating to pre-LCA tRNA diversification (Fig. 7). 
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LCA tRNA types and subtypes 

Types and subtypes of LCA tRNA were identified from bases in the D stem (sites 8-15, 
20A-26), its surround, and variable loop (sites 41-48) of sequences in Fig. 1. Bases at 
these sites form a set of interactions that had been shown to characterize the core 
structure of each kind of tRNA L form in thermophilic archaea and related species. [29] 
Generic sequences for each type, and subtype, found in LCA tRNA were constructed from 
consensus sequences for all tRNA attributed with the same core structure, sites not 
strictly conserved across the three species domains in a tRNA species were determined 
by majority rule. 

Evidence that a family of synthetically related aa preferentially charged one 
type/subtype of LCA tRNA was obtained from core homology among adaptors for same- 
family aa. For this purpose, the probability, p, was determined for chance occurrence of 
the frequency exhibited by the principal type/subtype of tRNA adaptor in an aa family. 



p(x) = 



/(I - V)"- (1) 



X is the number of aa with an adaptor having the principal form of core structure, in a 
family with n members; each kind of aa charged only one type/sub-type of tRNA. v 
signifies the normalized frequency of the principal type/subtype of tRNA, among adaptors 
for the twenty standard aa. The term in brackets, in Eqn. (1), is a binomial coefficient. 
Probabilities for each family were combined by method of Fisher, [37] to obtain the 
global probability for tRNA homology. 

An analogous procedure was adopted to determine homology among p-aaRS within aa 
families. 

Phylogenetic analysis 

Jukes-Cantor distances.- Phylogenetic kinship between tRNA paralogs, in the LCA, was 

assessed from Jukes-Cantor distances [40] determined at jointly conserved, non- 
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FIGURE 7 
universal sites (Fig. 7). The distance, du, between the trace of LCA tRNA species 
conserved in extant sequences (Fig. 1) 

d,j= -3 Infl -4s,^^ (2) 



L 3n,J 



excluded anticodon and variable loop sites, su is the number of sites with dissimilar 
bases, and nu the total number of relevant sites. / and j refer to tRNA species. The 
logarithmic argument was non-positive in 5 per cent (120/2403) of sequence 
comparisons, reflecting a chance excess difference ratio {Sjj/riij > 0.75) between randomly 
related sequences. In this event, the argument was reset to 1x10"^, yielding a difference 
ratio, Sij/riij « 0.749. 

Adaptation of the mutation distance metric to quantify differences between paralogs at 
mutually conserved sites provides a means to apply phylogenetic calculations to pre-LCA 
era molecular evolution. The distance obtained represents an estimate of the actual 
distance between both molecular species when they existed in the LCA, since it reflects 
differences over the available sample of sites, rather than the complete set. Comparing 
cladograms based on full-sequence versus mutually-conserved-trace distance in tRNA 
from archaeal species illustrates the efficacy of this procedure. Trace distances are seen 
to conserve the stacking order of aa identity in the Methanococcus tRNA tree in Fig. 8. 

FIGURE 8 
The Kendall rank correlation for the stacking order in both trees was, [42] x = 0.97, p = 
8.8x10"^. Consolidation of tRNA ortholog sequences, to obtain the conserved trace, 
pruned the number of branch tips by broadly half (16/37). Post-LCA sequences whose 
length differs, as a result of insertions and deletions, are commonly aligned; however, 
subsequent phylogenetic analysis is routinely based on the complete set of sites in a 
standard sequence and this distinguishes it from the present pre-LCA procedure. 

Neighbor-joining calculations, [39] involving trace Jukes-Cantor distances (Eqn. 2) for 
LCA tRNA (Fig. 1), led to a tree depicting pre-LCA tRNA diversification. The tRNA tree 
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root was located using attached aa path-distance to estimate the time-order of events 

in pre-LCA tRNA diversification; [13, 15] analysis of tRNA paralogs from a single source 

(LCA) precluded use of outgroups. The distribution of aa identity, core structure type, 

and anticodon bases within the tRNA tree obtained was evaluated statistically for 

evidence of cluster formation. 

Identity.- Jukes-Cantor distance solely reflects the fraction of dissimilar sites between 

two sequences, Sij/nij {Eqn. 2). As this measure has no direct dependence on sequence 

size, it provides an intensive measure of sequence relatedness. When size differences, 

Pij, are negligible, it is an intuitively reasonable indicator of evolutionary relatedness; 

distance being an inverse expression of relatedness. The trace of LCA tRNA conserved in 

extant sequences, however, may differ widely between tRNA species. In the present 

survey, the number of sites, nu, mutually shared between in the conserved trace of a pair 

of LCA tRNA species varied from to 36 sites. 

The matching probability (Eqn. 1) depends on both the number of identical sites, x, and 

total number of mutually conserved sites, n,^, in ancestral sequences. Unlike Jukes-Cantor 

distances, therefore, it provides an extensive measure of sequence relatedness. This is 

relevant to investigations on pre-LCA evolution reliant on the conserved trace of 

ancestral molecules. The logarithm (base 4) of the matching probability (Eqn. 1), 

lij = - log4 PiXij) (3) 

furnishes an additive measure of sequence relatedness. nu, the total number of non- 
universal sites, conserved jointly by tRNA species / and j (Eqn. 2) corresponds to n in 
Eqn. 1. The number of identical sites is, x = xu = na - Sij. The random chance, v, that a 
given kind of base will occur at any site equals V4, in a ribopolynucleotide sequence (A, 
G, C, U). 

Sequence identity, lu, is expressed in quaternary units (quarts): 1^ = -\og4 p{Xij = n) = 
-log4 4"'^ = n quarts, when identical bases occur at n relevant sites. Quaternary identity 
has the form of a measure of uncertainty (defined within a multiplicative constant), with 
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well known mathematical properties. [46] A conserved tRNA trace of n sites with an 

identity of n quarts implies that an uncertainty (entropy) corresponding to n choices 

among four kinds of bases is removed, on matching with a reference trace that specifies 

the identity of all sites in the test trace. 

It may be noted that a 4"-fold difference exists in matching probability among tRNA 
paralogs with n and 2n identical sites (s = 0): p{n) = 4'" and p(2n) = 4'^", giving 
p(n)/p(2n) = 4"74"^" = 4". A large 'sequence size' dependent difference in tRNA 
relatedness thus occurs. In contrast, Jukes-Cantor distances are equal for both pairs of 
tRNA. From Eqn. 2, d{n, s = 0) = 0, and d{2n, s = 0) = 0. Jukes-Cantor distance has, 
consequently, overestimated sequence relatedness between the species with a smaller 
jointly conserved trace. 

Results in Fig. 9. demonstrate, as anticipated, that quaternary identity has an inverse 

FIGURE 9 
relation with mutation distance. A linear fall-off (slope, -8.5 quarts/unit distance) in 
identity (Eqn. 3) accompanies increases in Jukes-Cantor distance (Eqn. 2), over a range 
that included 77 per cent of the forty-eight LCA tRNA species examined. 

Quaternary identity levels (Eqn. 3) helped identify tRNA ancestral to LCA tRNA species. 
Adaptor and ancestral tRNA were time-ordered from attached aa 'code age', inferred 
from path-distance. [13, 15] A tree was obtained portraying pre-LCA tRNA diversification 
in relation to stages identified in code evolution. 

Codon contiguity 

Contiguity was statistically evaluated for base triplets encoding: (1) same-family aa, (2) 
same-region aa in tRNA tree, and (3) same-domain aa in code. Nearest-neighbor triplet 
frequency between a designated aa and other members of the same set was appraised 
from probability in a hypergeometric distribution. 



' b ') / ra + b" 



-a-\ [ b '] / fa + b~] (4) 

p(x) = X 
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where p(0 < x < n) is the probability for chance occurrence of x nearest-neighbor 

triplets between a specified aa and other same-set members, when they are encoded by 

a total of n base triplets. A total of a triplets are nearest-neighbor to triplets of the given 

aa, and a total of £> triplets are non-contiguous with its codons. The quantities in 

parenthesis are binomial coefficients. p(x = 0) was defined to equal one. Three STOP 

signal triplets were omitted from these calculations. Codons differing solely by a 3'- 

pyrimidine were treated as indistinguishable. Codon-contiguity probabilities for each of 

the standard set of aa were combined by the method of Fisher. [37] 

Abbreviations 

A, adenine; C, cytosine; G, guanine; U, uridine; N, a standard nucleotide; R, purine; Y, 
pyrimidine; aa, amino acid; Ala, A, alanine; Arg, R, arginine; Asn, N, asparagine; Asp, D, 
aspartate; Cit, citrulline; Cys, C, cysteine; f-Met, N-formyl-methionine; Glu, E, 
glutamate; Gin, Q, glutamine; His, H, histidine; He, I, isoleucine; Leu, L, leucine; Lys, K, 
lysine; Met, M, methionine; Orn, ornithine; Phe, F, phenylalanine; Pro, P, proline; Sec, 
selenocysteine; Ser, S, serine; Thr, T, threonine; Trp, W, tryptophan; Tyr, Y, tyrosine; 
Val, V, valine; a-KG, a-ketoglutarate; OAA, oxaloacetate; 3PG, 3-phosphoglycerate; Pyr, 
pyruvate; Shi, shikimate; p, probability; SEM, standard error about mean; tRNA-I, type 
I, -II, type II, -A, core group A, -A', modified core group A, -B, group B, -C, group C, -D, 
group D, -D', modified core group D; Fd, ferredoxin; p-aaRS, protein-aminoacyl-tRNA 
synthetase; r-aaRS, ribozyme-synthetase. 

Notation 

Amino acid superscripts refer to stage of code evolution at aa addition, based on effective 
number of reaction steps from citrate cycle in synthesis pathway. 
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Table 1. Quaternary identity between conserved trace of LCA tRNA sequences^ 
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^ Upper-right half of table gives identity in quaternary units (see text) among LCA tRNA sequences (Fig. 1). 
Lower-left half, number of identical sites over total number of jointly conserved, non-universal tRNA sites. 
Identities for tRNA charged by same-family aa are enclosed within triangles. Bold identity values 
mark a tRNA and its putative ancestor. Superscripts on aa give their 'code age' in the path-distance 
model. Italics indicate Asn tRNA was likely ancestor, prior to subdivision of code box AAN (read initially 
with anticodon triplet, UUU) and capture of AAR by a Lys adaptor. 
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Table 2 - Codon contiguity in amino acid families, tRNA tree regions and code domains^ 





Amino acid: 


5^ 












Contig 


uity parameters-' 












Pre- 


Reg- 


Dom- 




































cursorion 


ain 


aa 


codons 


a 


b 


X 


n 


-2ln p 


a 


b 


X 


n 


-2ln 1 


D a 


b 


X 


n 


-2ln p 














Precursor 






Tree region 


1 




Code domain 


aKG 


1 


D 


Glu 


GAR 


11 


S2 


2 


6 


2.25 


11 


S2 


S 


7 


7.S5 


11 


S2 


2 


6 


2.25 


aKG 


1 


D 


Gin 


CAR 


11 


S2 


5 


6 


12.04 


11 


S2 


5 


7 


9.89 


11 


S2 


5 


6 


12.04 


aKG 


1 


D 


Pro 


CCN 


18 


24 


S 


5 


2.66 


18 


24 


S 


6 


2. SI 


18 


24 


S 


5 


2.66 


aKG 


1 


D 


His 


CAY 


8 


S6 


S 


7 


4.91 


8 


S6 


4 


8 


7.52 


8 


S6 


S 


7 


4.91 


Pyr 


2 


A2 


Ala 


GCN 


18 


24 


S 


8 


2.45 


18 


24 


S 


S 


5.29 


18 


24 


S 


S 


5.29 


Pyr 


2 


A2 


Val 


GUN 


18 


24 


8 


8 


15.80 


18 


24 


S 


S 


5.29 


18 


24 


S 


S 


5.29 


Pyr 


4 


N 


Leu CUN/ 


22 


18 


S 


6 


2.2S 


19 


21 


S 


9 


S.SO 


- 


- 


- 


- 













UUR 
































OAA 


1 


N 


Asp 


GAY 


8 


S6 


S 


10 


2.46 


8 


S6 


S 


8 


4.26 


- 


- 


- 


- 





OAA 


S 


Al 


Asn 


AAY 


8 


S6 


5 


11 


6.16 


11 


SS 


4 


IS 


2.81 


11 


SS 


4 


IS 


2.81 


OAA 


S 


Al 


Thr 


ACN 


18 


24 


6 


9 


6.S4 


18 


24 


8 


11 


7.56 


18 


24 


8 


11 


7.76 


OAA 


S 


Al 


Met AUG 


8 


S6 


5 


11 


6.16 


8 


S6 


5 


IS 


6.84 


8 


S6 


5 


IS 


6.84 


OAA 


S 


Al 


He 


AUY/A 


IS 


SO 


6 


10 


4.70 


IS 


SO 


5 


12 


S.5S 


IS 


SO 


5 


12 


S.5S 


OAA 


S 


Al 


Lys 


AAR 


IS 


SO 


7 


10 


9.61 


11 


S2 


7 


1210.88 


11 


S2 




12 


10.88 


OAA 


S 


Al 


Arg AGR/ 


8 


S6 


S 


14 


2.79 


21 


19 


6 


9 


S.SO 


21 


19 




9 


S.SO 










CGN 
































3PG 


4 


AS 


Cys 


UGY 


7 


S7 


4 


8 


8.68 


7 


S7 


4 


IS 


4.96 


7 


S7 




1 


S.68 


3PG 


4 


N 


Gly 


GGN 


17 


25 


S 


6 


2.42 


17 


25 


S 


11 


S.52 


- 


- 




- 





SPG 


4 


AS 


Trp 


UGG 


6 


S8 


S 


8 


5.74 


6 


S8 


4 


IS 


6.11 


6 


S8 




1 


S.99 


SPG 


4 


N 


Ser 


UCN/ 
AGY 


21 


20 


S 


5 


2.17 


21 


20 


5 


10 


2.54 


- 


- 




- 





Shi 


5 


B 


Phe UUY 


8 


S6 


1 


1 


S.41 


8 


S6 


1 


1 


S.41 


8 


S6 




1 


S.41 


Shi 


5 


B 


Tyr 


UAY 


6 


S8 


1 


1 


S.99 


6 


S8 


1 


1 


S.99 


6 


S8 




1 


S.99 



Codon contiguity: x^4o = -S 2 In p = 107.0 x^4o = -S 2 In p =104.9 x^32 = -S 2 In p = 82.9 

p = 5.09x10"^ p = 1.02x10"^ p = 2.16x10"^ 

Correlation"*: T(precursor v. region) = 0.92, p = 7.0x10'^ 
T(region v. domain) = 0.72, p = 8.SxlO"^ 
T(precursor V. domain) = 0.68, p = 1.5x10"^ 

^Codons differing by a S'-pyrimidine were treated as indistinguishable. STOP codons were omitted. 

^ Tree regions are from Fig. S. Code domains are from Fig. 6. N, no domain contained these amino 
acid adaptors. Al, A2 and AS refer to three distinct domains with type lA related tRNA adaptors. 

^ Nearest-neighbor codon frequencies were assessed by hypergeometric probabilities, p(0 < x < n), and 
p(x = 0) = 1, for each aa versus members of same family, each tRNA within same tree region 
and for each aa within a code domain, a, number of triplets a single base substitution from 
codons of a specified aa. b, number of codons more than one substitution from codons for this 
aa. X, number of codons read by adaptors for related aa one substitution from a 
specific aa codons. n, total number of codons assigned to the designated aa. Codon 
contiguity probabilities for each aa were combined by Fisher's method. [S7] 

"* Kendall correlation adjusted for tied ranks. Probabilities were from the normal distribution. [42] 
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Figure legends 

Fig. 1. LCA tRNA base sequences conserved in extant tRNA species. Amino acid identity, 
core structure type and anticodon of each tRNA is shown. Base sequences were 
reconstructed from consensus sequences representative of an indicated number of sources 
in Archaea {A), Bacteria (6), and Eukarya (E) (see Appendix B, tRNA sources and access 
numbers (Bayreuth database)). Sites conserved across species domains, in a tRNA 
species, are highlighted in its sequence. Highlighted in the overbar are sites conserved 
across species domains in the consensus sequences of virtually all tRNA species. Modified 
bases appear in their unmodified form. N, a site with no assigned base. Dashed lines 
enclose D-stem and variable loop bases in the core of a tRNA L-form. The conserved trace 
of LCA tRNA in thirty-nine type I tRNA species charged by eighteen aa is shown. They are 
divided into subtypes lA, IB, and ID. Modified core groups lA' and ID' also occur. Nine type 
II tRNA conserved traces of LCA adaptors. A generic sequence representative of each type, 
and subtype, of LCA tRNA is shown; bold letters mark conserved bases. Italic letters mark 
a gap (sites, 45-46) in type II tRNA sequences, indicating the location of supernumerary 
variable-loop bases. Amino acid superscripts specify stage at addition to the genetic code 
in path-distance model. This survey included 1100 tRNA sequences obtained in a search of 
the Bayreuth database (http://www.uni-bayreuth.de/departments/biochemie/tRNA/). 

Fig. 2. Subtypes of LCA type I tRNA. Cloverleaf structures of LCA tRNA with lA, IB and ID 
core structure are shown. Bold letters mark D stem and variable-loop bases conserved 
over three species domains. Lines connect base pairs and triples in tRNA L form. A dashed 
line depicts a non-standard purine-purine bond at 26:44. Solid circles highlight anticodon 
triplet. tRNA secondary structures depicted are based on generic LCA tRNA sequences in 
Fig. 1. 

Fig. 3. Dendogram depicting tRNA diversification in the pre-LCA era. The tree contains 
forty-eight tRNA species and was constructed from Jukes-Cantor distances at jointly 
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conserved, non-universal sites in the conserved trace of LCA tRNA. Amino acid identity, 

core type and anticodon bases (3' -> 5') of tRNA are indicated. The tree root is located at 

branch 1, which bifurcates into a cluster of tRNA charged by NH4"^-fixer aa (Asp^, Glu^, 

Gln^), identified as first-generation aa in the path-distance model of code evolution. [13, 

15] Same-family aa preferentially charged tRNA branching within tree regions rooted at 

branches 1 to 4. Orange branches indicate Asp family tRNA adaptors; blue, Glu family; 

green, Pyr family; brown, Ser family; and, red, Shi family. Number of tRNA charged by 

synthetically related aa to total number is given for each tree region. Branch lengths in 

region-1 and -2 are reduced 10-fold. This tree was constructed with sequences in Fig. 1 by 

the neighbor-joining method. [39] 

Fig. 4. Correlation between aa code age and adaptor distance in the LCA tRNA tree. Amino 
acid code age increases linearly with adaptor distance from the tree root, between stage 
2-7 of code evolution. The regression equation slope, p, exceeds zero with a probability, 
p(P > 0) = 0.948. This indicates the inferred pre-LCA tRNA phylogeny supports the path- 
distance model of code evolution, based on coordinated growth of aa synthesis pathways 
and tRNA diversity. Dots show distance of tRNA adaptors for aa designated by single letter 
abbreviations (see text). Squares show mean distance, with standard error bars, of 
indicated tRNA. R/W, tRNA-I charged by post-expansion phase (stage 9-14) additions to 
code: Arg^, Lys^°, Phe", Tyr^S His^-^, Trp^"^. S, type II tRNA Ser"^ adaptors. Distances are 
from LCA tRNA tree in Fig. 3. 

Fig. 5. Pre-LCA tRNA diversification during code evolution, (a) Tree nodes join an aa 
adaptor with its putative ancestor, inferred from trace identity. The ordinal length of each 
branch corresponds to the stage of code evolution a tRNA species originated, estimated 
from attached aa path-distance. A pre-code stage at the tree root involves a 'universal' 
type-ID tRNA adaptor for each NH4"^ fixer aa. By stage-2, tRNA-ID adaptors specific for 
Asp^, Glu^, or Gln^, and tRNA-IA adaptor for Asn, formed the NH4"^ Fixers Code. tRNA 
adaptors for fourteen different aa, with type lA, lA', IB, ID' and II core structure, showed 
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kinship with Asn-tRNA-IA. Proline"^ and His^^ tRNA-ID descent from GIn-tRNA-D was 

favored by phylogenetic evidence. LCA tRNA partition into two deeply branching domains 
rooted at Asn-tRNA-IA, or the primordial NH4"^ Fixer-tRNA-ID. Single letter abbreviations 

designate aa. Amino acid families are color-coded as in Fig. 3 , branch predates NH4"^- 

Fixers Code (stage 2). , post-'NH4"^-Fixers Code' branches, (b). Tree depicting evolution 

of tRNA types and subtypes. At stage-2, type lA tRNA (Asn^-adaptor) is shown to diverge 
from tRNA-ID. tRNA with modified type lA core (Cys^- and Val^-adaptors) and type II tRNA 
(Ser"*- adaptors) diverged from an Asn-adaptor-like tRNA by stage-5. A dashed line 
indicates divergence from a tRNA-IA ancestor (presumptive ancestral Ser-tRNA) of the 
Gly-adaptors, with a type ID' core. Formation of type IB tRNA-I (aromatic aa adaptors) 
from tRNA-IA is placed at stage-11 of code evolution. 

Fig. 6. Location of domains, LCA tRNA tree regions and aa families in the Standard Code. 
Each domain contains contiguous codons read by tRNA adaptors, displaying core group 
homology, for same-family aa. The overlap between domains and tree regions indicate 
adaptors for same-family aa descended from a common ancestor. Roman numerals and 
upper-case letters denote tRNA type and subtype, respectively. Arrows connect an adaptor 
and its putative ancestor. Arrow number gives identity (quarts) between conserved trace 
of adaptor and ancestor. Dashed arrows connect adaptor to ancestral Asn-adaptor, 
cognate with codons AAN. Adaptor/ancestor pairs occur mainly within code domains. Tan 
background marks Asp domain; blue, Glu domain; green, Pyr domain; yellow, Ser domain; 
and, red, Shi domain. Members of aa families are color-coded as in Fig. 3. Bold blue lines 
enclose tree region-1, with Glu family and Asp tRNA; green lines delineate region-2 for Pyr 
family adaptors; orange lines, region-3 for Asp-derived aa; brown lines, region-4 for Ser 
family and Leu; and red lines, region-5 for Shi family adaptors. This figure draws on 
results in Fig. 1 and 3, and Table 1. 
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Figure 7. Depicts location of jointly conserved, non-universal sites in LCA tRNA. 

Highlighted sites contained an identical base, conserved across species domains, in 
consensus sequences for Asn- and Met-tRNA. Modified bases appear in their unmodified 
form. Highlighted in the overbar are sites universally conserved among tRNA species. Solid 
triangle, an identical base at a non-universal site shared by the conserved trace of each 
LCA tRNA. Open triangle, a non-universal site occupied by a different base in each tRNA 
trace. Italics denote anticodon triplet. The trace of pre-LCA Asn- and Met-tRNA share 
eighteen non-universal, conserved sites. Seventeen are identical, consistent with close 
pre-LCA kinship. tRNA sequences are from Fig.l. 

Figure 8. Compares cladogram based on distances for full-sequence and conserved-trace 
comparisons in Methanococcus tRNA. a. Sequence cladogram for sixteen tRNA species, 
charged by fifteen kinds of aa. b. Conserved-trace cladogram for non-universal sites in 
tRNA conserved between Methanococcus species. Phylogenetic relations are shown to 
highly conserved under consolidation of tRNA sequences from different Methanococcus 
species and restriction to jointly-conserved, non-universal tRNA sites. Amino acid identity 
stacking order in both cladograms had a Kendall correlation coefficient, [42] x = 0.97 (p = 
8.8x10'^). Broken lines mark nodes that differ from the standard, full-sequence cladogram. 
Phylogenetic analysis was performed by neighbor-joining. [39] tRNA were from M. 
jannaschii (mj), M. vanielli (mva) and M. voltae (mvo). tRNA core structure group and 
anticodon bases (3' -> 5') are specified. Branch color indicates family of acylating aa, as 
indicated under Fig. 3. Methanococcus tRNA sequences were from the Bayreuth database. 

Figure 9. Depicts inverse relation between quaternary identity and Jukes-Cantor distance 
among LCA tRNA. Identity decreased linearly as mutation distance increased (goodness of 
fit, r^ = 0.60), over a range encompassing thirty-seven of forty-eight (77 per cent) tRNA 
species. Identity (quarts) and distance values are for Asn-tRNA versus forty-eight tRNA 
species in Table 1 and Fig. 3. 
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Appendix A: Distribution of amino acid identity, core group and anticodon bases in LCA 
tRNA tree 

Data relating to the distribution of aa identity, core group and anticodon 3'- and mid-base, 
in cluster-regions identified in the LCA tRNA tree (Fig. 3), are presented in Tables lA to 
4A. Statistical analysis of these data is provided, and they are discussed in this paper (see 
Channeled diversification of tRNA species in the pre-LCA era). 

Table lA. Distribution of adaptors for same-family amino acids in LCA tRNA tree regions^ 



Tree 
region 


Amino acid 
family 


Total 
tRNA 
(no.) 


tRNA for 

aa family 

(no.) 


Frequency 

specific 

tRNA 


Probability^ 
P 


-2 Inp 


1 


Glu 


9 


7 


0.167 


8.942x10"^ 


18.644 


2 


Pyr 


8 


6 


0.229 


2.412x10"^ 


12.055 


3 


Asp 


18 


14 


0.375 


5.078x10"^ 


15.171 
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Ser 


13 


7 


0.188 


4.022x10"^ 


11.032 


5 


Shi 








0.042 


1.0 







Total: 


48 


34 




x'(df = 10) : 


56.902 










Combined probability: 


1.391x10"^ 



^ Data are from tRNA tree in Fig. 3. 

^ Binomial probability (Eqn. 1) for number of specific aa adaptors in designated LCA tRNA 
tree region. Probabilities were combined by Fisher's method. [37] 



Table 2A. Distribution of tRNA types and subtypes between regions in LCA tRNA tree^ 



Tree 
region 


Type/ 
subtype 
of tRNA 


Total 
tRNA 
(no.) 


Specific 
tRNA 
(no.) 


Frequency 

specific 

tRNA 


Probability^ 
P 


-2 Inp 


1 


ID 


9 


8 


0.250 


1.030x10"^ 


18.363 


2 


lA 


8 


7 


0.521 


3.984x10"^ 


6.446 


3 


lA 


18 


15 


0.521 


5.050x10"^ 


10.577 


4 


II 


13 


7 


0.188 


4.022x10"^ 


11.032 


5 


IB 








0.042 


1.0 







Total: 


48 


37 




x'(df = 10) : 


46.416 










Combined probability: 


1.206x10"^ 



^ Data are from tRNA tree in Fig. 3. 

^ Binomial probability (Eqn. 1) for number of specific kinds of tRNA in designated 
tree region. 
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Table 3A. Distribution of tRNA with specific 3'-anticodon base in regions 

of LCA tRNA tree^ 



Tree 
region 


anticodon 
3'-base 
(nt. 36) 


Total 
tRNA 
(no.) 


tRNA with 

given nt.36 

(no.) 


Frequency 

tRNA with 

nt. 36 


Probability^ 
P 


-2 Inp 


1 


G 


9 


6 


0.271 


1.284x10"^ 


8.710 


2 


C 


8 


7 


0.208 


1.078x10"^ 


18.271 


3 


U 


18 


12 


0.292 


8.896x10"^ 


14.049 


4 


A 


13 


5 


0.188 


5.665x10"^ 


5.742 


5 


A 








0.188 


1.0 







Total: 


48 


37 




x'(df= 10): 


46.772 










Combined 


probability: 


1.039x10"^ 



^ Data are from tRNA tree in Fig. 3. 

^ Binomial probability (Eqn. 1) for number of tRNA with specific 3'-anticodon base (nt. 
36) in tree region. 



Table 4A. Distribution of tRNA with specific mid-anticodon base in regions 

of LCA tRNA tree^ 



Tree 
region 


anticodon 

mid-base 

(nt. 35) 


tRNA 

in region 

(no.) 


tRNA with 

given nt.35 

(no.) 


Frequency 

tRNA with 

nt.35 


Probability^ 
P 


-2 Inp 


1 


U 


9 


5 


0.208 


7.212xl0"2 


5.259 


2 


A 


8 


4 


0.271 


0.106 


4.480 


3 


A 


18 


5 


0.271 


0.206 


3.163 


4 


C 


13 


6 


0.250 


5.592x10"^ 


5.768 


5 


- 








- 


1.0 







Total: 


48 


37 




X2(df = 10): 


18.670 










Combined probability: 


4.466x10"^ 



^ Data are from tRNA tree in Fig. 3. 

^ Binomial probability (Eqn. 1) for number of tRNA with specific mid-anticodon base 
(nt. 35) in tree region. 



Appendix B: tRNA sources and access numbers (Bayreuth database) 
Table IB. 





Amino 
acid 


Anti- 
cod on 
3'-5' 


Domain 


Species 


Access no. 


Amino 
acid 


Anti- 
cod on 
3'-5' 


Domain 


Species 


Access no. 


Amino 

acid 


Anti- 
cod on 
3'-5' 


Domain 


Species 


Access no. 


Amino 

acid 


Anti- 
cod on 
3'-5' 


Domain 


Species 


Access no. 


1 


Asp^ CUG ARCHAEA 


Archaeglobus 
Fuig. 


DD0340 


1 Pro" GGU ARCHAEA 

2 

3 

4 

5 

6 

7 

1 GGG 

2 

3 

1 GGC 

2 

1 GGU EUBACT. 

2 

3 

4 

5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 

1 GGG 
2 


Archaeglobus 
Fulg. 


DP0341 


1 Val= CAC EUBACT. 

2 

1 CAU EUKARYA 

2 

3 

4 

5 

6 

1 

CAC 


Treponema 
Pallidum 


DV1272 


1 Leu' AAC EUBACT. 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

1 AAU 

2 

3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
1 GAU EUKARYA 


Acholeplasma 
Laid. 


DL1230 


2 


Methanococcus 
Jan. 


DD0650 


Methanococc 
us Jan. 


DP0650 


Staphylococ 
. Aure. 


DV1481 


Treponema 
Pallidum 


DL1272 


3 


Methanococ. 
Vani. 


DD0660 


Methanococ. 
Vani. 


DP0660 


Plasmodium 
Palcip. 


DV7500 


Borrelia 
Burgdorf. 


DL1281 


4 


Methanotherm. 
Per. 


DD0680 


Methanother 
m. Per. 


DP0680 


Trypanosom 
a Brucei 


DV7520 


Staphylococ. 
Aure. 


DL1482 


5 


Methanococ 
Voltae 


DD0740 


Methanococ. 
Voltae 


DP0740 


Leishmania 
Tarent. 


DV7550 


Helicobacter 
Pylo. 


DL1511 


6 


Tiiermococcus 
Celer 


DD0940 


Therm ococcu 
s Celer 


DP0940 


Dictyosteliu 
m Dis. 


DV7571 


Bacillus 
Subtilis 


DL1543 


7 


Haloferax 
Volcanii 


RD0500 


Haloferax 
Volcanii 


RP0502 


Saccharomy 
ces Cer. 


DV7632 


E.Coli 


DL1662 


1 EUBACT. 


Mycoplasma 
Capric. 


DD1140 


Archaeglobus 
Pulg. 


DP0342 


Leptomonas 
Collos. 


DV7710 


Haemophilus 
Influ. 


DL2003 


2 


Mycoplasma 
Gen. 


DD1150 


Methanococc 
us Jan. 


DP0651 


Trypanosom 
a Brucei 


DV7521 


Synechocysti 
sSp. 


DL2142 


3 


Mycoplasma 
Mycoid. 


DD1180 


Haloferax 
Volcanii 


RP0501 


2 


Thr' UGU ARCHAEA 


Saccharomy 
ces Cer. 


RV7631 


Mycoplasma 
Capric. 


RL1140 


4 


Mycoplasma 
Pneumo. 


DD1200 


Archaeglobus 
Pulg. 


DP0340 




Mycoplasma 
Capric. 


RL1141 


5 


Acholeplasma 
Laid. 


DD1230 


Haloferax 
Volcanii 


RP0500 


1 


Archaeglobu 
s Pulg. 


DT0342 


Rhodospiril.R 
ub. 


RL2020 


6 


Spiroplasma 
Melif. 


DD1260 


Mycoplasma 
Capric. 


DP1140 


2 
3 

4 
5 

1 UGG 

2 

3 

4 

5 

6 

1 UGC 

2 

1 UGU EUBACT. 

2 

3 

4 

5 

6 

7 

8 


Methanococ 
cus Jan. 


DT0650 


Anacystis 
Nidulans 


RL2100 


7 


Treponema 
Pallidum 


DD1270 


Mycoplasma 
Gen. 


DP1150 


Methanococ 
Vani. 


DT0660 


Bacillus 
Stearo. 


RL2120 


8 


Borrelia 
Burgdorf. 


DD1280 


Mycoplasma 
Mycoid. 


DP1180, 


Methanothe 
rm. Per. 


DT0680 


Mycoplasma 
Capric. 


DL1140 


9 


Streptomyces 
Liv. 


DD1350 


Mycoplasma 
Pneumo. 


DP1200 


Methanococ 
Voltae 


DT0740 


Mycoplasma 
Gen. 


DL1150 


10 


Staphylococ. 
Aure. 


DD1480 


Spiroplasma 
Melif. 


DP1260 


Archaeglobu 
s Pulg. 


DT0340 


Mycoplasma 
Pneumo. 


DL1200 


11 


Staphylococ. 
Aure. 


DD1481 


Borrelia 
Burgdorf. 


DP1280 


Methanococ 
cus Jan. 


DT0651 


Acholeplasma 
Laid. 


DL1232 


12 


Lactobac.Bulga 
ric. 


DD1500 


Staphylococ. 
Aure. 


DP1480, 


Methanococ 
.Vani. 


DT0661 


Treponema 
Pallidum 


DL1270 


13 


Helicobacter 
Pylo. 


DD1510 


Lactobac.Bul 
garic. 


DP1500 


Thermococc 
us Celer 


DT0940 


Borrelia 
Burgdorf. 


DL1283 


14 


Bacillus 
Subtilis 


DD1540 


Helicobacter 
Pylo. 


DP1511 


Halobacteri 
um Cut. 


RT0380 


Streptomyces 
Coel. 


DL1310 


15 


Bacillus Sp. 
Ps3 


DD1570 


Bacillus 
Subtilis 


DP1540 


Haloferax 
Volcanii 


RT0501 


Staphylococ. 
Aure. 


DL1481 


16 


E.Coli 


DD1660 


E.Coli 


DP1660 


Archaeglobu 
s Pulg. 


DT0341 


Helicobacter 
Pylo. 


DL1510 


17 


Haemophilus 
Influ. 


DD2000 


Salmonella 
Typhi. 


DP1700 


Haloferax 
Volcanii 


RT0500 


Bacillus 
Subtilis 


DL1541 


18 


Haemophilus 
Influ. 


DD2001 


Photobact. 
Phosph. 


DP1740 


Mycoplasma 
Capric. 


DTI 141 


Bacillus 
Subtilis 


DL1542 


19 


Haemophilus 
Influ. 


DD2002 


Aeromonas 
Hydroph. 


DP1780 


Mycoplasma 
Gen. 


DT1151 


E.Coli 


DL1664 


20 


Synechocystis 
Sp. 


DD2140 


Haemophilus 
Influ. 


DP2000 


Mycoplasma 
Mycoid. 


DT1180 


Azoarcus 
Sp.Bh72 


DL1950 


21 


Thermus 
Thermophi. 


RD1580 


Streptococcu 
s Mut. 


DP2070 


Mycoplasma 
Pneumo. 


DT1202 


Haemophilus 
Influ. 


DL2000 


1 EUKARYA 


Plasmodium 
Falcip. 


DD7500 


Synechocysti 
sSp. 


DP2142 


Acholeplas 
ma Laid. 


DT1230 


Haemophilus 
Influ. 


DL2001 


2 


Candida 
Albicans 


DD7600 


Salmonella 
Typhi. 


RP1702 


Treponema 
Pallidum 


DT1272 


Synechocysti 
sSp. 


DL2143 


3 


Phytophthora 
Par. 


DD7610 


Treponema 
Pallidum 


DP1272 


Borrelia 
Burgdorf. 


DT1280 


Synechococc 
us Sp. 


DL2150 


4 


Saccharomyce 
sCer. 


DD7630 


Helicobacter 
Pylo. 


DP1510 


Staphylococ 
. Aure. 


DT1480 


Plasmodium 
Palcip. 


DL7500 



Table IB. (continued) 










































Amino 
acid 


Anti 
cod on 
3'-5' 


Domain 


Species 


Access 
no. 




Amino 
acid 


Anti 
cod on 
3'-5' 


Domain 


Species 


Access 
no. 




Amino 
acid 


Anti- 
cod on 
3'-5' 


Domain 


Species 


Access 
no. 




Amino 
acid 


Anti- 
cod on 
3'-5' 


Domain 


Species 


Access 
no. 


5 
6 




Saccharomyce 
sCer. 


DD7631 


3 

4 




Bacillus 
Circulans 


DP1560 


9 
10 




Helicobacter 
Pylo. 


DT1510 


2 

1 




Saccharomyc 
es Cer. 


RL7631 




Schizosacciia.P 
om. 


DD7640 




E.Coli 


DP1662 




Bacillus 
Subtilis 


DT1541 


GAC 


Leishmania 
Tarent. 


DL7550 


7 


Euglena 
Gracilis 


RD7780 


5 

6 

1 GGC 

2 

3 

4 

5 

6 

1 GGU EUKARYA 

2 

3 


Synechocysti 
sSp. 


DP2140 


11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 

1 UGG 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 Thr« UGG EUBACT. 

20 


Thermus 
Thermophi. 


DT1580 


2 

3 

4 

5 

1 AAC 

2 

3 

1 AAU 

2 


Candida 
Tropicali 


DL7750 








Salmonella 
Typhi. 


RP1701 


Stigmatella 
Aurant 


DT1630 


Candida 
Lusitaniae 


DL7760 


1 Glu2 CUU ARCHAEA 


Archaeglobus 
Fulg. 


DE0340 


Treponema 
Pallidum 


DP1270 


Azospirillum 
Lipo. 


DT1720 


Pichia 
Guilliermon 


DL7770 


2 


Pyrococcus 
Furios. 


DE0400 


Streptomyces 
Ambo. 


DP1360 


Campylobac 
.Jejuni 


DT1861 


Candida 
Albicans 


RL7600 


3 


Methanococcus 
Jan. 


DE0650 


Mycobact. 
Tuberc. 


DP1400 


Haemophilu 
s Influ. 


DT2000 


Saccharomyc 
es Cer. 


DL7630 


4 


Methanococ.Va 
ni. 


DE0660 


E.Coli 


DP1661 


Synechocys 
tis Sp. 


DT2141 


Torulopsis 
Utilis 


RL7650 


5 


Methanotherm. 
Per. 


DE0680 


Synechocysti 
sSp. 


DP2141 


Mycoplasma 
Capric. 


RT1141 


Candida 
Cylindra, 


RL7660 


6 


Haloferax 
Volcanii 


RE0501 


Salmonella 
Typhi. 


RP1700 


Mycoplasma 
Mycoid. 


RT1180 


Dictyostelium 
Dis. 


DL7570 


1 cue 


Arcinaeglobus 
Fulg. 


DE0341 


Plasmodium 
Falcip. 


DP7500 


Bacillus 
Subtilis 


RT1540 


Saccharomyc 
es Cer. 


DL7631 


2 


Ruminobacter 
Amylo 


DE0700 


Saccharomyc 
es Cer. 


DP7630 


E.Coli 


DT1662 


3 




Saccharomyc 
es Cer. 


DL7632 


3 


Haloferax 
Volcanii 


RE0500 


Saccharomyc 
es Cer. 


DP7631 


Thermotoga 
Marit. 


DT0990 








1 CUU EUBACT, 


Mycoplasma 
Capric. 


DE1140 


4 


Ser" UCG ARCHAEA 


Torulopsis 
Utilis 


RP7650 


Mycoplasma 
Gen. 


DT1150 


1 


Arg' UCU ARCHAEA 


Archaeglobus 
Fulg. 


DR0344 


2 


Mycoplasma 
Gen. 


DE115 


D 




Mycoplasma 
Pneumo. 


DT1201 


2 
3 

1 ucc 

1 GCU 

2 

3 

1 GCC 

2 

1 GCG 

2 

3 

4 

1 UCU EUBACT. 

2 

3 

4 

5 

6 Arg' UCU EUBACT, 

7 


Methanococc 
us Jan. 


DR0650 


3 


Mycoplasma 
Mycoid. 


DE1180 


1 


Archaeglobus 
Fulg. 


DS0340 


Treponema 
Pallidum 


DT1271 


Methanococ. 
Vani. 


DR0660 


4 


Mycoplasma 
Pneumo. 


DE1200 


2 

3 

4 

5 

1 AGU 

2 

3 

1 AGC 

2 

3 

4 

1 AGG 

2 

3 

1 UCG EUBACT. 

2 Ser^ UCG EUBACT. 
3 


Halobacteriu 
m Mar. 


DS0440 


Borrelia 
Burgdorf. 


DT1281 


Archaeglobus 
Fulg. 


DR0340 


5 


Acholeplasma 
Laid. 


DE1230 


Methanococc 
us Jan. 


DS0651 


Helicobacter 
Pylo. 


DT1511 


Archaeglobus 
Fulg. 


DR0343 


6 


Treponema 
Pallidum 


DE1271 


Methanother 
m. Fer. 


DS0680 


Bacillus 
Subtilis 


DT1540 


Methanococc 
us Jan. 


DR0651 


7 


Borrelia 
Burgdorf. 


DE1280 


Haloferax 
Volcanii 


RS0500 


Thermus 
Thermophi. 


DT1581 


Haloferax 
Volcanii 


RR0502 


8 


Plesiomonas 
Shige. 


DE1460 


Archaeglobus 
Fulg. 


DS0341 


Stigmatella 
Aurant 


DT1631 


Archaeglobus 
Fulg. 


DR0342 


9 


Haemophilus 
Ducre. 


DE1490 


Methanococc 
us Jan. 


DS0652 


E.Coli 


DT1660 


Haloferax 
Volcanii 


RR0500 


10 


Lactobac.Bulga 
ric. 


DE1500 


Methanopyru 
s Kand. 


DS0760 


E.Coli 


DT1661 


Archaeglobus 
Fulg. 


DR0341 


11 


Helicobacter 
Pylo. 


DE1510 


Archaeglobus 
Fulg. 


DS0342 


E.Coli 


DT1664 


Methanococc 
us Jan. 


DR0652 


12 


Helicobacter 
Pylo. 


DE1511 


Sulfolobus 
Solfa. 


DS0860 


Listeria 
Ivanovii 


DT1680 


Halobacteriu 
m Cut. 


RR0380 


13 


Lactococcus 
Lactis 


DE1530 


Halobacteriu 
m Cut. 


RS0380 


Listeria 
Monocyto. 


DT1690 


Haloferax 
Volcanii 


RR0501 


14 


Bacillus 
Subtilis 


DE1540 


Haloferax 
Volcanii 


RS0501 


Pseudomon 
as Aer. 


DT1821 


Mycoplasma 
Capric. 


DR1141 


15 


Bacillus 
Subtilis 


DE1541 


Archaeglobus 
Fulg. 


DS0343 


Campylobac 
.Jejuni 


DT1860 


Mycoplasma 
Gen. 


DR1150 


16 


Bacillus Sp. 
Ps3 


DE1570 


Methanococc 
us Jan. 


DS0653 


Haemophilu 
s Influ. 


DT2001 


Mycoplasma 
Mycoid. 


DR1181 


17 


E.Coli 


DE1660 


Haloferax 
Volcanii 


RS0502 


Rhizobiumie 
gumino. 


DT2030 


Mycoplasma 
Pneumo. 


DR1202 


18 


Aeromonas 
Hydroph. 


DE1780 


Mycoplasma 
Capric. 


DS1141 


Synechocys 
tis Sp. 


DT2142 


Acholeplasma 
Laid. 


DR1230 


19 Glu^ CUU EUBACT, 


Haemophilus 
Influ. 


DE2000 


Mycoplasma 
Gen. 


DS1150 


E.Coli 


RT1660 


Treponema 
Pallidum 


DR1274 


20 


Salmonella 
Enteri. 


DE2040 


Mycoplasma 
Pneumo. 


DS1200 


E.Coli 


RT1661 


Borrelia 
Burgdorf. 


DR1282 



Table IB. (continued) 





Amino 
acid 


Anti 
cod on 
3'-5' 


Domain 


Species 


Access 
no. 




Amino 
acid 


Anti 
cod on 
3'-5' 


Domain 


Species 


Access 
no. 




Amino 
acid 


Anti- 
cod on 
3'-5' 


Domain 


Species 


Access 
no. 




Ami 
no 
acid 


Anti- 
cod on 
3'-5' 


Domain 


Species 


Access 
no. 


21 




Synechocystis 
Sp. 


DE2140 


4 

5 

6 

7 

8 

9 
10 
11 
12 
13 
14 
15 

1 AGU 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 


Acholeplasma 
Laid, 


DS1230 


1 UGA 

1 UGC 

2 

3 

4 

5 

6 

7 

1 UGU EUKARYA 

2 

3 

4 

5 

1 UGA 

2 

3 

1 UGC 

2 

3 


Mycoplasma 
Capric. 


DTI 140 


8 
9 
10 
11 
12 
13 

1 UCC 
2 
3 
4 
5 
6 
7 
8 

1 GCU 
2 
3 
4 
5 

1 GCC 
2 
3 
4 
5 
6 
7 

1 GCG 
2 
3 
4 
5 

1 GCA 
2 


Helicobacter 
Pylo. 


DR1512 


22 


E.Coli 


RE1661 


Treponema 
Pallidum 


DS1272 


Mycoplasma 
Gen. 


DTI 152 


E.Coli 


DR1661 


23 


E.Coli 


RE1662 


Borrelia 
Burgdorf. 


DS1283 


Mycoplasma 
Pneumo. 


DT1200 


Haemophilus 
Influ. 


DR2001 


1 cue 


Treponema 
Pallidum 


DE1270 


Streptomyces 
Liv, 


DS1350 


Treponema 
Pallidum 


DT1270 


Synechocysti 
sSp. 


DR2142 


2 


Streptomyces 
Rim. 


DE1340 


Helicobacter 
Pylo, 


DS1512 


Clostridium 
Aceto. 


DT1450 


E.Coli 


RR1662 


3 


Streptomyces 
Liv. 


DE1350 


Bacillus 
Subtilis 


DS1542 


E.Coli 


DT1663 


E.Coli 


RR1663 


4 


Streptomyces 
Liv. 


DE1351 


Haemophilus 
Influ. 


DS2003 


Pseudomon 
as Aer. 


DT1820 


Mycoplasma 
Gen. 


DR1152 


1 CUU EUKARYA 


Plasmodium 
Falcip. 


DE7500 


Synechocysti 
sSp. 


DS2142 


Synechocys 
tis Sp. 


DT2140 


Treponema 
Pallidum 


DR1271 


2 


Dictyostelium 
Dis. 


DE7570 


Mycoplasma 
Capric. 


RS1140, 


Plasmodium 
Falcip. 


DT7500 


Agrobacter.T 
ume. 


DR1420 


3 


Saccharomyce 
sCer. 


DE7630 


Bacillus 
Subtilis 


RS1541 


Leishmania 
Tarent. 


DT7550 


Helicobacter 
Pylo. 


DR1513 


4 


Saccharomyce 
sCer. 


DE7632 


E.Coli 


RS1661 


Saccharomy 
ces Cer. 


DT7632 


E.Coli 


DR1664 


5 


Schizosacctia.P 
om. 


DE7640 


E.Coli 


DS1663 


Eimeria 
Tenella 


DT7680 


Prochlococcus 
Mar. 


DR1800 


1 cue 


Saccharomyce 
sCer. 


DE7631 


Mycoplasma 
Capric. 


DS1140 


Toxoplasma 
Gondoii 


DT7730 


Streptomyces 
Vene. 


DR2050 


2 


Schizosaccha.P 
om. 


DE7641 


Mycoplasma 
Gen. 


DS1151 


Dictyosteliu 
m Dis. 


DT7570 


Synechocysti 
sSp. 


DR2141 








Mycoplasma 
Mycoid. 


DS1180 


Saccharomy 
ces Cer. 


DT7630 


Mycoplasma 
Gen. 


DR1153 


1 Asn^ UUG ARCHAEA 


Archaeglobus 
Fulg. 


DN0340 


Mycoplasma 
Pneumo. 


DS1203 


Saccharomy 
ces Cer. 


RT7631 


Mycoplasma 
Pneumo. 


DR1201 


2 


Methanococcus 
Jan. 


DN0650 


Acholeplasma 
Laid, 


DS1231 


Trypanosom 
a Brucei 


DT7520 


Treponema 
Pallidum 


DR1273 


3 


Methanococ.Va 
ni. 


DN0660 


Spiroplasma 
Melif, 


DS1260 


Dictyosteliu 
m Dis. 


DT7571 


Borrelia 
Burgdorf. 


DR1281 


4 


Methanotherm. 
Per. 


DN0680 


Treponema 
Pallidum 


DS1273 


Saccharomy 
ces Cer. 


DT7631 


Helicobacter 
Pylo. 


DR1511 


5 


Halobacterium 
Cut. 


RN0380 


Borrelia 
Burgdorf. 


DS1281 


4 


Met^ UAC ARCHAEA 


Schizosacch 
a. Pom. 


DT7640 


Treponema 
Pallidum 


DR1272 


6 


Haloferax 
Volcanii 


RN0500 


Streptomyces 
Gris. 


DS1300 




E.Coli 


DR1660 


7 


Methanobac.Th 
erm. 


RN0620 


Staphylococ. 
Aure. 


DS1480 


1 


Archaeglobu 
s Fulg. 


DM0340 


Salmonella 
Typhi. 


DR1700 


1 EUBACT. 


Mycoplasma 
Capric. 


DN1140 


Staphylococ. 
Aure. 


DS1481 


2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
1 EUBACT. 


Archaeglobu 
s Fulg. 


DM0341 


Haemophilus 
Influ. 


DR2000 


2 


Mycoplasma 
Gen. 


DN1150 


Helicobacter 
Pylo. 


DS1510 


Archaeglobu 
s Fulg. 


DM0342 


Synechocysti 
sSp. 


DR2140 


3 


Mycoplasma 
Mycoid. 


DN1180 


Lactococcus 
Lactis 


DS1530 


Methanococ 
cus Jan. 


DM0651 


E.Coli 


RR1664 


4 


Mycoplasma 
Pneumo. 


DN1200 


Bacillus 
Subtilis 


DS1541 


Methanococ 
cus Jan. 


DM0652 


Aeromonas 
Hydroph. 


DR1780 


5 


Acholeplasma 
Laid. 


DN1230 


E.Coli 


DS1661 


Methanothe 
rm. Per. 


DM0680 


Mycoplasma 
Gen. 


DR1151 


6 


Treponema 
Pallidum 


DN1270 


Haemophilus 
Influ. 


DS2000 


Thermoplas 
ma Acid. 


DM0900 


Mycoplasma 
Pneumo. 


DR1200 


7 


Borrelia 
Burgdorf. 


DN1280 


Haemophilus 
Influ. 


DS2002 


Thermofil. 
Pendens 


DM0960 


Treponema 
Pallidum 


DR1270 


8 


Streptomyces 
Liv. 


DN1350 


Synechocysti 
sSp. 


DS2140 


Thermotoga 
Marit. 


DM0990 


Borrelia 
Burgdorf. 


DR1280 


9 


Streptomyces 
Liv. 


DN1351 


Mycoplasma 
Capric. 


RS1141 


Thermotoga 
Marit. 


DM0991 


Helicobacter 
Pylo. 


DR1514 


10 


Klebsiella 
Aeroge. 


DN1410 


Bacillus 
Subtilis 


RS1540 


Haloferax 
Volcanii 


RM0500 


Mycoplasma 
Capric. 


DR1140 


11 


Lactobac.Bulga 
ric. 


DN1500 


E.Coli 


RS1664 


Mycoplasma 
Capric. 


DM1140 


Mycoplasma 
Mycoid. 


DR1180 



Table IB. (continued) 



Amino Anti Domain 
acid codon 
3'-5' 



UUG EUKARYA 



12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 

1 

2 

3 

4 

5 

6 

7 



1 Gln^ GUU ARCHAEA 

2 

3 

4 

1 GUC 

2 

1 

2 
3 
4 
5 



GUU EUBACT, 



Species 


Access 
no. 


Lactococcus 
Lactis 


DN1530 


Bacillus 
Subtilis 


DN1540 


Bacillus 
Subtilis 


DN1541 


Bacillus Sp. 
Ps3 


DN1570 


E.Coli 


DN1660 


Listeria 
Ivanovii 


DN1680 


Listeria 
Monocyto. 


DN1690 


Haemophilus 
Influ. 


DN2000 


Haemophilus 
Influ, 


DN2001 


Haemophilus 
Influ. 


DN2001 


Synechocystis 
Sp. 


DN2140 


Azospirillum 
Lipo. 


RN1720 


Azospirillum 
Lipo. 


RN1721 


Plasmodium 
Falcip. 


DN7500 


Trypanosoma 
Brucei 


DN7520 


Trypanosoma 
Brucei 


DN7521 


Tetrahymena 
Pyrif. 


DN7530 


Dictyostelium 
Dis. 


DN7570 


Saccharomyce 
sCer. 


DN7630 


Schizosaccha.P 
om. 


DN7640 


Yersinia 
Pseudotu. 


DN7740 






Archaeglobus 
Fulg. 


DQ0341 


Methanococcus 
Jan. 


DQ0650 


Methanococ.Va 
ni. 


DQ0660 


Methanopyrus 
Kand. 


DQ0760 


Halobacterlum 
Cut. 


RQ0380 


Haloferax 
Volcanii 


RQ0500 


Mycoplasma 
Capric. 


DQ1140 






Mycoplasma 
Gen. 


DQ1150 


Mycoplasma 
Pneumo. 


DQ1200 


Acholeplasma 
Laid. 


DQ1230 


Treponema 
Pallidum 


DQ1271 



Amino AntI 
acid codon 
3'-5' 
AGC 



Ser" AGG 



UCG EUKARYA 



Species 


Access 
no. 




Mycoplasma 
Gen. 


DS1153 


2 


Mycoplasma 
Pneumo. 


DS1201 


3 


Spiroplasma 
Citri 


DS1250 


4 


Treponema 
Pallidum 


DS1271 


5 


Lactobac.Del 
bruec. 


DS1520 


6 


E.Coli 


DS1660 


7 


Synechocysti 
sSp. 


DS2143 


8 


Lactobac.Bul 
garic. 


DS1500 


9 


Mycoplasma 
Gen. 


DS1152 


10 


Mycoplasma 
Pneumo. 


DS1202 


11 


Treponema 
Pallidum 


DS1270 


12 


Borrelia 
Burgdorf. 


DS1282 


13 


Helicobacter 
Pylo. 


DS1511 


14 


Bacillus 
Subtilis 


DS1540 


15 


Bacillus Sp. 
Ps3 


DS1570 


16 


E.Coli 


DS1664 


17 


Haemophilus 
Influ. 


DS2001 


18 


Haemophilus 
Influ. 


DS2004 


19 


Clostridium 
Perfr. 


DS2130 


20 


Synechocysti 
sSp. 


DS2141 


21 


Synechococc 
us Sp. 


DS2150 


22 


Bacillus 
Subtilis 


RS1542 


23 


E.Coli 


RS1662 


24 


E.Coli 


RS1663 


1 


Plasmodium 
Falcip. 


DS7500 


2 


Dictyostelium 
Dis. 


DS7571 


3 


Saccharomyc 
es Cer. 


DS7631 


4 


Candida 
Cylindra. 


RS7664 


5 


Plasmodium 
Falcip. 


DS7501 


6 








Dictyostelium 
Dis. 


DS7570 




Podospora 
Anserina 


DS7620 


1 


Podospora 
Anserina 


DS7621 


2 


Saccharomyc 
es Cer. 


DS7633 


3 



Amino Anti- 
acid codon 
3'-5' 



UAG ARCHAEA 



Species 


Access 
no. 




Mycoplasma 
Gen. 


DM1150 


3 


Mycoplasma 
Gen. 


DM1151 


4 


Mycoplasma 
Mycoid. 


DM1180 


5 


Mycoplasma 
Pneumo. 


DM1200 


6 


Acholeplas 
ma Laid. 


DM1230 


7 


Acholeplas 
ma Laid. 


DM1231 


8 


Spiroplasma 
Melif. 


DM1260 


9 


Treponema 
Pallidum 


DM1270 


10 


Treponema 
Pallidum 


DM1271 


11 


Borrelia 
Burgdorf. 


DM1280 


12 


Borrelia 
Burgdorf. 


DM1281 


13 


Staphylococ 
. Aure. 


DM1480 


14 


Helicobacter 
Pylo. 


DM1510 


15 


Helicobacter 
Pylo. 


DM1511 


1 


Bacillus 
Subtilis 


DM1540 


2 


Bacillus 
Subtilis 


DM1541 


3 


E.Coli 


DM1660 


4 


Photobac. 
Leiogna. 


DM1750 


5 


Haemophilu 
s Influ. 


DM2000 


1 


Haemophilu 
s Influ. 


DM2001 


2 


Haemophilu 
s Influ. 


DM2002 


1 


Haemophilu 
s Influ. 


DM2004 


1 


Thermus 
Thermophi. 


RM1580 


2 


Plasmodium 
Falcip. 


DM7500 


3 


Plasmodium 
Falcip. 


DM7501 


4 


Dictyosteliu 
m Dis. 


DM7570 


5 


Saccharomy 
ces Cer. 


DM7630 


6 


Saccharomy 
ces Cer. 


DM7631 


7 


Schizosacch 
a. Pom. 


DM7640 


8 








9 


Archaeglobu 
s Fulg. 


DI0340 




Methanococ 
cus Jan. 


DI0650 


1 


Haloferax 
Volcanii 


RI0500 


2 



Ami Anti- Domain 
no codon 
acid 3'-5' 



UCU EUKARYA 



UCC 



GCU 
GCA 



1 Lys' UUU ARCHAEA 



Species 


Access 
no. 


Spiroplasma 
Melif. 


DR1260 


Streptomyces 
Liv. 


DR1350 


Staphylococ. 
Aure. 


DR1480 


Lactobac.Bul 
garic. 


DR1500 


Bacillus 
Subtilis 


DR1540 


E.Coli 


DR1663 


Salmonella 
Typhi. 


DR1701 


Haemophilus 
Influ. 


DR2002 


Haemophilus 
Influ. 


DR2003 


Haemophilus 
Influ. 


DR2004 


Synechocysti 
sSp. 


DR2143 


E.Coli 


RR1660 


E.Coli 


RR1661 


Plasmodium 
Falcip. 


DR7501 


Trypanosoma 
Brucei 


DR7521 


Dictyostelium 
Dis. 


DR7571 


Saccharomyc 
es Cer. 


DR7631 


Saccharomyc 
es Cer. 


RR7632 


Trypanosoma 
Brucei 


DR7520 


Saccharomyc 
es Cer. 


DR7632 


Leishmania 
Tarent. 


DR7551 


Plasmodium 
Falcip. 


DR7500 


Trypanosoma 
Brucei 


DR7522 


Leishmania 
Tarent. 


DR7550 


Dictyostelium 
Dis. 


DR7570 


Neurospora 
Crassa 


DR7590 


Saccharomyc 
es Cer. 


DR7630 


Schizosaccha 
.Pom. 


DR7640 


Schizosaccha 
.Pom. 


DR7641 






Toxoplasma 
Gondoii 


DR7730 




Archaeglobus 
Fulg. 


DK0340 


Methanococc 
us Jan. 


DK0650 



Table IB. (continued) 



Amino Anti Domain 
acid codon 
3'-5' 



6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
1 
2 
3 
4 
5 
6 
7 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
1 
2 
3 
4 
5 



GUU EUKARYA 



Ala^ 



CGU ARCHAEA 



Species 


Access 
no. 


Amino 
acid 


Borrelia 
Burgdorf. 


DQ1280 


6 


Staptiylococ. 
Aure. 


DQ1480 


7 


Helicobacter 
Pylo. 


DQ1510 


8 


Bacillus 
Subtilis 


DQ1540 


1 


E.Coli 


DQ1660 


2 


Haemophilus 
Influ. 


DQ2000 


3 


Haemophilus 
Influ. 


DQ2001 


4 


Synechocystis 
Sp. 


DQ2140 






Mycoplasma 
Capric. 


RQ1140 


1 Cys= 


E.Coli 


RQ1661 


2 


Treponema 
Pallidum 


DQ1270 


3 


Streptomyces 
Rim. 


DO1340 


4 


Streptomyces 
Rim. 


D01341 


1 


Streptomyces 
Liv. 


DO1350 


2 


Streptomyces 
Liv. 


D01351 


3 


E.Coli 


DQ1661 


4 


E.Coli 


RQ1660 


5 


Plasmodium 
Falcip. 


DQ7500 


6 


Trypanosoma 
Brucei 


DQ7520 


7 


Trypanosoma 
Brucei 


DQ7521 


8 


Leishmania 
Tarent. 


DQ7550 


9 


Dictyostelium 
Dis. 


DQ7570 


10 


Saccharomyce 
sCer. 


DQ7630 


11 


Saccharomyce 
sCer. 


DQ7632 


12 


Schizosaccha.P 
om. 


DQ7640 


1 


Toxoplasma 
Gondoii 


DQ7730 


2 


Tetrahymena 
Therm. 


RQ7542 


3 


Saccharomyce 
sCer. 


D07631 


Crithidia 
Fascic. 


DQ7670 


1 Gly= 


Leishmania 
Mexica. 


D07700 


2 


Leptomonas 
Collos. 


D07710 


3 


Leptomonas 
Seymou. 


DO7720 


1 




2 


Halorubrum 
Distri. 


DA0310 


3 



AGC 



ecu ARCHAEA 



Species 


Access 
no. 


Schizosaccha 
.Pom. 


DS7640 


Schizosaccha 
.Pom. 


DS7641 


Candida 
Cylindra. 


RS7661 


Saccharomyc 
es Cer. 


DS7632 


Saccharomyc 
es Cer. 


DS7634 


Schizosaccha 
.Pom. 


DS7642 


Candida 
Cylindra. 


RS7663 




Archaeglobus 
Fulg. 


DC034 



Halobacteriu 
m Cut. 


DC038 



Haloferax 
Volcanii 


DC050 



Methanococc 
us Jan. 


DC065 



Mycoplasma 
Capric. 


DC114 



Mycoplasma 
Gen. 


DC115 



Mycoplasma 
Pneumo. 


DC120 



Spiroplasma 
Melif. 


DC126 



Treponema 
Pallidum 


DC127 



Streptomyces 
Liv, 


DC135 



Staphylococ. 
Aure. 


DC148 



Helicobacter 
Pylo. 


DC151 



Bacillus 
Subtilis 


DC154 



E.Coli 


DC166 



Haemophilus 
Influ. 


DC200 



Synechocysti 
sSp. 


DC214 



Plasmodium 
Falcip. 


DC750 



Saccharomyc 
es Cer. 


DC763 



Schizosaccha 
.Pom. 


DC764 





Archaeglobus 
Fulg. 


DG034 



Methanococc 
us Jan. 


DG065 



Haloferax 
Volcanii 


RG050 
3 


Halobacteriu 
m Cut. 


RG038 



Methanococc 
us Jan. 


DG065 

1 


Haloferax 
Volcanii 


RG050 

1 



Amino Anti- Domain 
acid codon 
3'-5' 
UAC 



UAU 

UAG EUBACT. 



Species 


Access 
no. 


Methanococ 
.Vani. 


DI0660 


Methanothe 
rm. Fer. 


DI0680 


Haloferax 
Volcanii 


RI0501 


Bartonella 
Bacil. 


DlllOO 


Bartonella 
Elizab. 


DIlllO 


Bartonella 
Hensela 


DI1120 


Bartonella 
Quint. 


DI1130 


Mycoplasma 
Capric. 


DI1141 


Mycoplasma 
Gen. 


DI1150 


Acetobacter 
Aceti 


DI1160 


Acetobacter 
Europ. 


DI1170 


Acetobacter 
Hanse. 


DI1190 


Mycoplasma 
Pneumo. 


DI1201 


Acetobacter 
Lique. 


DI1210 


Acetobacter 
Lique. 


DI1211 


Acholeplas 
ma Laid. 


DI1230 


Acetobacter 
Xylin. 


DI1240 


Treponema 
Pallidum 


DI1270 


Borrelia 
Burgdorf. 


DI1280 


Borrelia 
Burgdorf. 


DI1280 


Burl<holderi 
a Cepa. 


DI1320 


Coxiella 
Burnetii 


DI1330 


Gluconobact 
er Oxy. 


DI1370 


Lactobac.Bu 
Igaric. 


DI1500 


Helicobacter 
Pylo. 


DI1510 


Lactococcus 
Lactis 


DI1530 


Bacillus 
Subtilis 


DI1540 


Bacillus 
Subtilis 


DI1541 


Lactobac.Ac 
idophi. 


DI1550 


Lactobac.Ca 
sei 


DI1590 


Rhodotherm 
us Mar. 


DI1600 


Lactobac.Cu 
rvatus 


DI1610 


Thiobacillus 
Ferro 


DI1620 


Lactobac.He 
Ivetic. 


DI1640 



Ami Anti- Domain 
no codon 
acid 3'-5' 



UUC 



UUU EUBACT. 



UUU EUKARYA 



Species 


Access 
no. 


Methanococ. 
Vani. 


DK0660 


Methanother 
m. Fer. 


DK0680 


Methanococ. 
Voltae 


DK0740 


Methanopyru 
s Kand. 


DK0760 


Haloferax 
Volcanii 


RK0500 


Archaeglobus 
Fulg. 


DK0341 


Haloferax 
Volcanii 


RK0501 


Mycoplasma 
Capric. 


DK1140 


Mycoplasma 
Gen. 


DK1150 


Mycoplasma 
Pneumo. 


DK1200 


Mycoplasma 
Pg50 


DK1220 


Acholeplasma 
Laid. 


DK1231 


Treponema 
Pallidum 


DK1271 


Borrelia 
Burgdorf. 


DK1280 


Staphylococ. 
Aure. 


DK1480 


Helicobacter 
Pylo. 


DK1510 


Bacillus 
Subtilis 


DK1540 


Bacillus 
Subtilis 


DK1541 


E.Coli 


DK1660 


Azospirillum 
Lipo. 


DK1720 


Haemophilus 
Influ. 


DK2000 


Haemophilus 
Influ. 


DK2001 


Synechocysti 
sSp. 


DK2140 


Mycoplasma 
Capric. 


RK1141 


Mycoplasma 
Capric. 


DK1141 


Mycoplasma 
Gen. 


DK1151 


Mycoplasma 
Pneumo. 


DK1201 


Acholeplasma 
Laid. 


DK1230 


Treponema 
Pallidum 


DK1270 


Borrelia 
Burgdorf. 


DK1281 


Streptomyces 
Liv. 


DK1350 


Haemophilus 
Influ. 


DK2002 


Mycoplasma 
Capric. 


RK1140 


Plasmodium 
Falcip. 


DK7500 



Table IB. (continued) 



Amino 
acid 



Ant! Domain 
cod on 
3'-5' 



2 

3 

4 

5 

6 

7 

8 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 

1 

2 

3 

1 

2 

3 

4 

1 

2 

3 

4 

5 

6 

7 



CGG 



CGC 



CGU EUBACT, 



Species 


Access 
no. 


Halorubrum 
Lacusp. 


DA0320 


Halorubrum 
Saccha. 


DA0330, 


Arctiaeglobus 
Fuig. 


DA0340, 


Halorubrum 
Sodome. 


DA0350, 


Halorubrum 
Vacuol. 


DA0360, 


Natronobac, 
Grego. 


DA0370, 


Halobacterium 
Cut. 


DA0380 


Natronobac. 
Phara. 


DA0390 


Halobacterium 
Hal. 


DA0420 


Methanobac.Fo 
rmi. 


DA0580 


Methanobac.Th 
erm. 


DA0620 


Methanococcus 

Jan. 


DA0650 


Methanococ.Va 
ni. 


DA0660 


Methanothrix 
Soeh. 


DA0670 


Methanotherm. 
Fer. 


DA0680 


Methanospir. 
Hung. 


DA0780 


Thermococcus 
Celer 


DA0940 


Thermoprot. 
Tenax 


DA0980 


Haloferax 
Volcanii 


RA0502 


Archaeglobus 
Fulg. 


DA0342 


Methanococcus 

Jan. 


DA0651 


Haloferax 
Volcanii 


RA0501 


Archaeglobus 
Fulg. 


DA0341 


Thermoprot. 
Tenax 


DA0981 


Halobacterium 
Cut. 


RA0380 


Haloferax 
Volcanii 


RA0500 


Bartonella 
Elizab. 


DAlllO 


Bartonella 
Quint. 


DA1130 


Mycoplasma 
Capric. 


DA1140 


Mycoplasma 
Gen. 


DA1150 


Acetobacter 
Aceti 


DA1160 


Acetobacter 
Europ. 


DA1170 


Mycoplasma 
Mycoid. 


DA1180 


Acetobacter 
Hanse. 


DA1190 



Amino Anti Domain 
acid codon 
3'-5' 



CCC 



ecu EUBACT. 



Species 


Access 
no. 


Haloferax 
Volcanii 


RG050 
2 


Methanobac. 
Therm. 


RG062 



Archaeglobus 
Fulg. 


DG034 

1 


Sulfolobus 
Solfa. 


DG086 



Thermofil. 
Pendens 


DG096 



Haloferax 
Volcanii 


RG050 



Mycoplasma 
Capric. 


DG114 



Mycoplasma 
Gen. 


DG115 

1 


Mycoplasma 
Mycoid. 


DG118 



Mycoplasma 
Pneumo. 


DG120 



Treponema 
Pallidum 


DG127 



Borrelia 
Burgdorf. 


DG128 



Streptomyces 
Liv. 


DG135 

1 


Staphylococ. 
Aure. 


DG148 

1 


Staphylococ. 
Aure. 


DG148 
2 


Staphylococ. 
Aure. 


DG148 
3 


Helicobacter 
Pylo. 


DG151 

1 


Lactococcus 
Lactis 


DG153 



Bacillus 
Subtilis 


DG154 



Bacillus 
Subtilis 


DG154 

1 


Stigmatella 
Aurant 


DG163 



E.Coli 


DG166 



Pseudomonas 
Aer. 


DG182 



Campylobac.J 
ejuni 


DG186 



Rickettsia 
Prow. 


DG187 



Haemophilus 
Influ. 


DG200 

2 


Synechocysti 
sSp. 


DG214 



Staphylococ. 
Epid. 


RG138 



Staphylococ. 
Epid. 


RG138 

1 


E.Coli 


RG166 
2 


Salmonella 
Typhi. 


RG170 

1 


Mycoplasma 
Gen. 


DG115 



Mycoplasma 
Pneumo. 


DG120 

1 


Acholeptasma 
Laid. 


DG123 




Amino Anti- 
acid codon 
3'-5' 
32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

61 

62 

63 

1 UAC 

2 



Species 


Access 
no. 




E.Coli 


DI1660 


2 


Mycobact.Le 
prae 


DI1710 


3 


Trichodesmi 
um Sp. 


DI1730 


4 


Mycoplasma 
Sp. 


DI1760 


5 


Phytoplasm 
a Sp. 


DI1770 


6 


Aeromonas 
Hydroph. 


DI1780 


1 


Prevotella 
Rumini. 


DI1790 


2 


Pseudomon 
as Cepac. 


DI1810 


3 


Pseudomon 
as Aer. 


DI1820 


4 


Pseudomon 
as Glad. 


DI1830 


5 


Pseudomon 
as Fluor. 


DI1840 


6 


Pseudomon 
as Mallei 


DI1850 


7 


Campylobac 
.Jejuni 


DI1860 


Pseudomon 
as Mend. 


DI1880 


1 


Caulobacter 
Cres. 


DI1890 


2 


Brucella 
Suis 


DI1900 


3 


Brucella 
Melitens. 


DI1910 


4 


Brucella 
Abortus 


DI1920 


5 


Brucella 
Abortus 


DI1921 


1 


Ochrobactru 
m Anth. 


DI1960 


2 


Pseudomon 
as Pick. 


DI1970 


3 


Pseudomon 
as Pseud. 


DI1990 


4 


Haemophilu 
s Influ. 


DI2001 


5 


Salmonella 
Enteri. 


DI2040 


6 


Stenotro.Ma 
Itoph. 


DI2080 


7 


Xanthomon 
as Campe. 


DI2090 


8 


Anacystis 
Nidulans 


DI2100 


9 


Synechocys 
tis Sp. 


DI2141 


10 


Synechocys 
tis Sp. 


DI2142 


11 


Mycoplasma 
Mycoid. 


RI1180 


12 


Thermus 
Thermophi. 


RI1580 


13 


E.Coli 


RI1661 


14 


Mycoplasma 
Capric. 


DI1140 


15 


Mycoplasma 
Gen. 


DI1151 


16 



Ami Anti- 
no codon 
acid 3'-5' 



UUC 



1 Phe' AAG ARCHAEA 



EUBACT. 



Species 


Access 
no. 


Trypanosoma 
Brucei 


DK7521 


Leishmania 
Tarent. 


DK7550 


Dictyostelium 
Dis. 


DK7570 


Saccharomyc 
es Cer. 


DK7630 


Saccharomyc 
es Cer. 


RK7631 


Trypanosoma 
Brucei 


DK7520 


Trypanosoma 
Brucei 


DK7522 


Dictyostelium 
Dis. 


DK7571 


Saccharomyc 
es Cer. 


DK7631 


Saccharomyc 
es Cer. 


DK7632 


Schizosaccha 
.Pom. 


DK7640 


Saccharomyc 
es Cer. 


RK7630 






Archaeglobus 
Fulg. 


DF0340 


Methanococc 
us Jan. 


DF0650 


Methanococ. 
Vani. 


DF0660 


Sulfolobus 
Solfa. 


DF0860 


Haloferax 
Volcanii 


RF0500 


Mycoplasma 
Capric. 


DF1140 


Mycoplasma 
Gen. 


DF1150 


Mycoplasma 
Mycoid. 


DF1180 


Acholeplasma 
Laid. 


DF1230 


Spiroplasma 
Melif. 


DF1260 


Treponema 
Pallidum 


DF1270 


Borrelia 
Burgdorf. 


DF1280 


Borrelia 
Burgdorf. 


DF1281 


Staphylococ. 
Aure. 


DF1480 


Helicobacter 
Pylo. 


DF1510 


Lactococcus 
Lactis 


DF1530 


Bacillus 
Subtilis 


DF1540 


Bacillus 
Subtilis 


DF1541 


E.Coli 


DF1660 


Haemophilus 
Influ. 


DF2000 


Haemophilus 
Influ. 


DF2001 



Table IB. (continued) 



Amino Anti Domain 
acid codon 
3'-5' 



9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 



Species 



[viycoplasma 
Pneumo. 



Acetobacter 
Lique. 



Acetobacter 
Lique. 



Acholeplasma 
Laid. 



Acholeplasma 
Laid. 



Acetobacter 
Xylin. 



Spiroplasma 
Melif. 



Treponema 
Pallidum 



Borrelia 
Burgdorf. 



Bordetella Sp. 



Burkholderia 
Cepa. 



Coxiella 
Burnetii 



Gluconobacter 
Oxy. 



Enterococcus 
Hirae 



Staphylococ. 
Aure. 



Lactobac.Bulga 



Helicobacter 
Pylo. 



Lactococcus 
Lactis 



Lactococcus 
Lactis 



Lactococcus 
Lactis 



Bacillus 
Subtilis 



Bacillus 
Subtilis 



Bacillus 
Subtilis 



Bacillus 
Subtilis 



Lactobac.Acido 

£hi 



Lactobac.Casei 



Rhodothermus 
Mar. 



Lactobac.Curva 
tus 



Thiobacillus 
Ferro 



Lactobac. Helve 
tic. 



Leuconostoc 
Lactis 



Leuconostoc 
Mesen. 



Mycobact.Lepr 



DA12GG 



DA1210 



DA1211 



DA123G 



DA1231 



DA1240 



DA1500 



DA1511 



DA1530 



DA1531 



DA1532 



DA1540 



DA1541 



DA1542 



DA1543 



Amino Anti Domain 
acid codon 
3'-5' 



CCC 



ecu EUKARYA 



1 Val= CAU ARCHAEA 

2 

3 

1 CAG 



Species 



Treponema 
Pallidum 



Borrelia 
Burgdorf. 



Streptomyces 
Liv. 



Staphylococ. 
Aure. 



Lactobac. Bui 
garic. 



Helicobacter 
Pylo. 



Bacillus 
Subtilis 



Bacillus 
Subtilis 



Thermus 
Thermophi. 



Azorhizobium 
Caul. 



Haemophilus 
Influ. 



Haemophilus 
Influ. 



Pseudomonas 
Putida 



Synechocysti 
s Sp. 



Thermus 
Thermophi. 



E.Coli 



Synechocysti 
s Sp. 



Streptomyces 
Coel. 



Thermus 
Thermophi. 



E.Coli 



Salmonella 
Typhi. 



Plasmodium 
Falcip. 



Leishmania 
Tarent. 



Schizosaccha 
.Pom. 



Saccharomyc 
es Cer. 



Saccharomyc 
es Cer. 



Saccharomyc 
es Cer. 



Schizosaccha 
.Pom. 



DG127 

1 



DG128 

1 



DG135 




DG148 




DG150 




DG151 




DG154 

1 



DG154 
2 



DG158 




DG166 

1 



DG193 




DG200 




DG200 

1 



DG201 




DG214 
2 



Amino Anti- Domain 
acid codon 
3'-5' 



DG1581 



DG166 
2 



DG214 

1 



RG131 




RG158 




RG166 




RG170 




DG750 

1 



DG755 




DG764 

1 



RG763 

1 



DG763 




DG763 

1 



DG764 




Archaeglobus 
Fulg. 



Methanococc 
us Jan. 



Methanococ. 
Vani. 



Archaeglobus 
Fulg. 



UAG EUKARYA 
UAU 



1 Leu' GAU ARCHAEA 

2 

3 

4 

5 

1 GAG 

2 

3 

4 

1 GAC 

2 

3 

4 

5 

1 AAC 

2 

3 

1 AAU 



Species 



Mycoplasma 
Mycoid. 



Mycoplasma 
Pneumo. 



Acholeplas 
ma Laid. 



Spiroplasma 
Melif. 



Staphylococ 
. Aure. 



Bacillus 
Subtilis 



Azoarcus 
Sp.Bh72 



Haemophilu 
s Influ. 



Synechocys 
tis Sp. 



Bacillus 
Subtilis 



Plasmodium 
Falcip. 



Leishmania 
Tarent. 



Saccharomy 
ces Cer. 



Access 
no. 



DI1180 



DI1200 



DI1230 



DI1260 



DI1480 



DI1542 



Archaeglobu 
s Fulg. 



Methanococ 
cus Jan. 



Methanococ 
.Vani. 



Methanothe 
rm. Fer. 



Haloferax 
Volcanii 



Archaeglobu 
s Fulg. 



Methanococ 
cus Jan. 



Methanopyr 
us Kand. 



Haloferax 
Volcanii 



Archaeglobu 
s Fulg. 



Halobacteri 
um Mar. 



Sulfolobus 
Solfa. 



Thermoprot 
. Tenax 



Haloferax 
Volcanii 



Archaeglobu 
s Fulg. 



Thermoprot 
. Tenax 



Haloferax 
Volcanii 



Archaeglobu 
s Fulg. 



DL0340 



DL0650 



DL0660 



DL0680 



RL0504 



DL0341 



DL0651 



DL0760 



Ami Anti- Domain 
no codon 
acid 3'-5' 



17 
18 
19 
20 
21 

1 

2 

3 

4 

5 

6 

7 

8 

9 
10 

1 Tyr" 

2 

3 

4 

5 

6 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 



AUG ARCHAEA 



EUBACT. 



Species 


Access 
no. 


Bacillus 
Stearo. 


DF2120 


Synechocysti 
sSp. 


DF2140 


Thermus 
Thermophi. 


RF1580 


Rhodospiril.R 
ub. 


RF2020 


Agmenellum 
Quadr. 


RF2060 


Neurospora 
Crassa 


DF759G 


Saccharomyc 
es Cer. 


DF7630 


Saccharomyc 
es Cer. 


DF7631 


Saccharomyc 
es Cer. 


DF7632 


Saccharomyc 
es Cer. 


DF7633 


Schizosaccha 
.Pom. 


DF764G 


Schizosaccha 
.Pom. 


DF7641 


Toxoplasma 
Gondoii 


DF7730 


Scenedesmus 
Obliq. 


RF7560 


Euglena 
Gracilis 


RF7780 






Archaeglobus 
Fulg. 


DYG340 


Methanococc 
us Jan. 


DYG650 


Methanococ. 
Vani. 


DYG660 


Methanococ. 
Voltae 


DY0740 


Thermotoga 
Marit. 


DYG990, 


Haloferax 
Volcanii 


RYG5GG 


Mycoplasma 
Capric. 


DY1140, 


Mycoplasma 
Gen. 


DY1150, 


Mycoplasma 
Pneumo, 


DY120G 


Treponema 
Pallidum 


DY1270 


Borrelia 
Burgdorf. 


DY1280 


Streptomyces 
Liv. 


DY1350 


Staphylococ. 
Aure. 


DY1480 


Helicobacter 
Pylo. 


DY1510 


Bacillus 
Subtilis 


DY1540 


Bacillus 
Subtilis 


DY1541 


Thermus 
Thermophi. 


DY1580 


Stigmatella 
Aurant 


DY1630 



Table IB. (continued) 



Amino Anti Domain 
acid codon 
3'-5' 



43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 

1 

2 

3 

4 

5 

6 

1 

2 



Species 


Access 
no. 


Trictiodesmium 
Sp. 


DA1730 


Aeromonas 
Hydropti. 


DA1780, 


Prevotella 
Rumini. 


DA1790, 


Pseudomonas 
Cepac. 


DA1810, 


Pseudomonas 
Aer. 


DA1820 


Pseudomonas 
Glad. 


DA1830, 


Pseudomonas 
Fluor. 


DA1840 


Pseudomonas 
Mallei 


DA1850 


Campylobac.Je 
juni 


DA1860 


Pseudomonas 
Mend. 


DA1880 


Caulobacter 
Cres. 


DA1890 


Brucella Suis 


DA1900 


Brucella 
Melitens. 


DA1910 


Brucella 
Abortus 


DA1920 


Brucella 
Abortus 


DA1921 


Ochrobactrum 
Anth. 


DA1960 


Pseudomonas 
Picl<. 


DA1970 


Pseudomonas 
Pseud. 


DA1990 


Haemophilus 
Influ. 


DA2001 


Stenotro.Malto 
ph. 


DA2080 


Anacystis 
Nidulans 


DA2100 


Synechocystis 
Sp. 


DA2142 


Streptococcus 
Sal. 


DA2160 


Streptococcus 
Pn. 


DA2170 


E.Coli 


RA1661 


E.Coli 


RA1662 


Treponema 
Pallidum 


DA1271 


Helicobacter 
Pylo. 


DA1510 


E.Coli 


DA1661 


Haemophilus 
Influ. 


DA2000 


Synechocystis 
Sp. 


DA2141 


E.Coli 


RA1660 


Treponema 
Pallidum 


DA1270 


Synechocystis 
Sp. 


DA2140 



Amino Anti Domain 
acid codon 
3'-5' 



CAC 



CAU EUBACT. 



Species 


Access 
no. 


Methanococc 
us Jan. 


DV0652 


Sulfolobus 
Solfa. 


DV0860 


Halobacteriu 
m Cut. 


RV0382 


Haloferax 
Volcanii 


RV0501 


Archaeglobus 
Fulg. 


DV0340 


Methanococc 
us Jan. 


DV0650 


Halobacteriu 
m Cut. 


RV0380 


Halobacteriu 
m Cut. 


RV0381 


Haloferax 
Volcanii 


RV0500 


Mycoplasma 
Capric. 


DV1140 


Mycoplasma 
Mycoid. 


DV1180 


Mycoplasma 
Pneumo. 


DV1200 


Acholeplasma 
Laid, 


DV1230 


Treponema 
Pallidum 


DV1271 


Borrelia 
Burgdorf. 


DV1280 


Mycobact. 
Tuberc. 


DV1400 


Staphylococ. 
Aure. 


DV1480 


Lactobac.Bui 
garic. 


DV1500 


Helicobacter 
Pylo, 


DV1511 


Bacillus 
Subtilis 


DV1540 


Bacillus Sp. 
Ps3 


DV1570 


E.Coli 


DV1660 


Azospirillum 
Lipo, 


DV1720 


Haemophilus 
Influ. 


DV2000 


Synechocysti 
sSp. 


DV2140 


Synechococc 
us Sp. 


DV2150 


E.Coli 


RV1662 


Treponema 
Pallidum 


DV1270 


Streptomyces 
Liv, 


DV1350 


Streptomyces 
Liv, 


DV1351 


Helicobacter 
Pylo, 


DV1510 


E.Coli 


DV1661 


E.Coli 


DV1662 


Haemophilus 
Influ. 


DV2001 



Amino Anti- Domain 
acid codon 
3'-5' 



GAU EUBACT. 



GAG 



Species 


Access 
no. 




Methanococ 
cus Jan. 


DL0652 


13 


Haloferax 
Volcanii 


RL0503 


14 


Mycoplasma 
Capric. 


DL1141 


15 


Mycoplasma 
Gen. 


DL1151 


16 


Mycoplasma 
Pneumo. 


DL1201 


17 


Mycoplasma 
Pg50 


DL1220 


18 


Acholeplas 
ma Laid. 


DL1231 


19 


Treponema 
Pallidum 


DL1273 


20 


Borrelia 
Burgdorf. 


DL1282 


1 


Staphylococ 
. Aure. 


DL1480 


2 


Helicobacter 
Pylo. 


DL1513 


3 


Bacillus 
Subtilis 


DL1544 


4 


E.Coli 


DL1661 


5 


Photobac. 
Leiogna. 


DL1750 


6 


Haemophilu 
s Influ. 


DL2004 


7 


Synechocys 
tis Sp. 


DL2140 


8 


Mycoplasma 
Capric. 


RL1142 


9 


Mycoplasma 
Gen. 


DL1152 


10 


Treponema 
Pallidum 


DL1271 




Borrelia 
Burgdorf. 


DL1280 


1 


Helicobacter 
Pylo. 


DL1512 


2 


Bacillus 
Subtilis 


DL1545 


3 


E.Coli 


DL1663 


4 


Haemophilu 
s Influ. 


DL2002 


5 


Synechocys 
tis Sp. 


DL2141 


6 


E.Coli 


RL1662 


1 


Treponema 
Pallidum 


DL1274 


2 


Bacillus 
Subtilis 


DL1540 


3 


E.Coli 


DL1660 


4 


Salmonella 
Typhi. 


DL1700 


5 


Mycobact. Le 
prae 


DL1710 


6 


Aeromonas 
Hydroph. 


DL1780 


7 


Rhizobium 
Meliloti 


DL1940 


8 


Bordetella 
Pertus. 


DL1980 


9 



Ami Anti- Domain 
no codon 
acid 3'-5' 



1 His^^ GUG ARCHAEA 



Species 


Access 
no. 


E.Coli 


DY1660 


E.Coli 


DY1661 


Pseudomonas 
Aer. 


DY1820 


Campylobac.J 
ejuni 


DY1860, 


Ricl<ettsia 
Prow. 


DY1870, 


Haemophilus 
Influ. 


DY2000 


Synechocysti 
sSp. 


DY2140 


Bacillus 
Stearo. 


RY2120 


Plasmodium 
Falcip. 


DY750O 


Trypanosoma 
Brucei 


DY7520 


Tetrahymena 
Therm, 


DY7540 


Dictyostelium 
Dis. 


DY7570 


Saccharomyc 
es Cer. 


DY7630 


Saccharomyc 
es Cer. 


DY7631 


Leishmania 
Donava. 


DY7690 


Scenedesmus 
Obliq. 


RY7560 


Schizosaccha 
.Pom. 


RY7640 


Torulopsis 
Utilis 


RY7650 






Archaeglobus 
Fulg. 


DH0340 


Methanococc 
us Jan. 


DH0650, 


Methanococ. 
Vani. 


DH0660, 


Methanother 
m. Fer. 


DH0680, 


Halobacteriu 
m Cut. 


RH0380 


Haloferax 
Volcanii 


RH0500 


Mycoplasma 
Capric. 


DH1140 


Mycoplasma 
Gen. 


DH1150 


Mycoplasma 
Pneumo. 


DH1200 


Acholeplasma 
Laid. 


DH1230 


Treponema 
Pallidum 


DH1270 


Borrelia 
Burgdorf. 


DH1280 


Staphylococ. 
Aure. 


DH1480 


Helicobacter 
Pylo. 


DH1510 


Bacillus 
Subtilis 


DH1540 



Table IB. (conti nued) 

Amino Anti Domain 
acid codon 
3'-5' 
1 CGU EUKARYA 



Species 


Access 
no. 


Saccharomyce 
sCer. 


DA7630 


Plasmodium 
Falcip. 


DA7500, 


Toxoplasma 
Gondoii 


DA7730, 



Amino Anti Domain 
acid codon 
3'-5' 



266 



9 
10 

266 



Species 


Access 
no. 


Synectiocysti 
sSp. 


DV2141 


E.Coli 


RV1660, 


Bacillus 
Stearo. 


RV2120 



Amino Anti- Domain 
acid codon 
3'-5' 



9 
10 

11 

266 



Species 


Access 
no. 


Synechocys 
tis Sp. 


DL2144 


E.Coli 


RL1661 


Anacystis 
Nidulans 


RL2101 



Ami Anti- Domain 
no codon 
acid 3'-5' 



10 
11 
12 

265 



Species 


Access 
no. 


Bacillus 
Subtilis 


DH1541 


E.Coli 


DH1660, 


Salmonella 
Typhi. 


DH1700 



1063 



Amino 
acid 


Anti- 
cod on 
3'- 5' 


Domain 


Species 


Access 
no. 



13 His" GUG EUBACT. 
14 
15 
16 

1 EUKARYA 

2 
3 
4 
5 
6 

1 Trp" ACC ARCHAEA 

2 

3 

4 

1 EUBACT. 

2 

3 

4 

5 



Photobact. 
Phosph. 


DH1740 


Aeromonas 
Hydroph. 


DH1780 


Haemophilus 
Influ. 


DH2000 


Synechocystis 
Sp. 


DH2140 


Plasmodium 
Falcip. 


DH7500 


Dictyostelium 
Dis. 


DH7570 


Saccharomyce 
sCer. 


DH7630 


Schizosaccha.P 
om. 


DH7640 


Saccharomyce 
sCer. 


RH7630 


Saccharomyce 
sCer. 


RH7631 




Archaeglobus 
Fulg. 


DW0340 


Halobacterium 
Med. 


DW0460 


Haloferax 
Volcanii 


DW0500 


Methanococcus 

Jan. 


DW0650 


Thermotoga 
Marit. 


DW0990 


Mycoplasma 
Capric. 


DW1141 


Mycoplasma 
Gen. 


DW1150 


Mycoplasma 
Pneumo. 


DW1200 


Acholeplasma 
Laid. 


DW1230 





Amino 


Anti- 


Domain 




acid 


codon 
3'- 5' 




6 


His" 


GUG 




7 









9 

10 
11 
12 
13 
14 
15 
16 

1 
2 
3 
4 
5 
6 
7 



Species Access 
no. 

Spiroplasma Citri DW1251 

Treponema Pallidum DW1270 

Borrelia Burgdorf. DW1280 

Streptomyces Gris. DW1300 

Staphylococ. Aure. DW1480 

Helicobacter Pylo. DW1510 

Bacillus Subtilis DW1540 

E.Coli DW1660 

Rickettsia Prow. DW1870 

Haemophilus Influ. DW2000 

Synechocystis Sp. DW2140 

EUKARYA Plasmodium Falcip. DW7500 

Leishmania Tarent. DW7550 

Dictyostelium Dis. DW7570 

Saccharomyces Cer. DW7630 

Saccharomyces Cer. DW7631 

Schizosaccha.Pom. DW7640 

Toxoplasma Gondoii DW7730 



19 



18 



37 



1063 



1100 



Amino Anti- No. tRNA 
acid oodon ABE 
3' ^5' 


d: 


5'-Acceptor stem 

2 1 3 1 4 1 5 1 6 


'i 


5'-D-stem 
< ^^ 

1 p iWn |i2|i3|E, 


D-loop 


|20 1 A 1 B 


3'-D-stem 5'-Anticodon stem Anticodon loop 


3'-Anticodon stem Variable loop S'-T^C-stem 


T^C-loop S'-TTC-stem 3 '-Acceptor stem Dis. Bases 


i|22|23|24 |25 |26|27|28|29|30 |31 |3? ^34|35|36H 


|38 139 |40|41 |42|43|44EU4ei|4/Ey49|!^l> l!^1 fcfcKkKEI 


[iHd57 ldi|5g|60 Ml 62 1 63 1 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 |72|73 | (nO.) 



Type I tRNA, Core Group A 
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figure 8 
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figure 9 
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